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Preface 



The 10th International Conference on the Principles and Practice of Constraint 
Programming (CP 2003) was held in Toronto, Canada, during September 27 
October 1, 2004. Information about the conference can be found on the Web at 
http : //ai .uwaterloo . ca/~cp2004/ 

Constraint programming (CP) is about problem modelling, problem solving, 
programming, optimization, software engineering, databases, visualization, user 
interfaces, and anything to do with satisfying complex constraints. It reaches 
into mathematics, operations research, artificial intelligence, algorithms, com- 
plexity, modelling and programming languages, and many aspects of computer 
science. Moreover, CP is never far from applications, and its successful use in 
industry and government goes hand in hand with the success of the CP research 
community. 

Constraint programming continues to be an exciting, flourishing and growing 
research field, as the annual CP conference proceedings amply witness. This year, 
from 158 submissions, we chose 46 to be published in full in the proceedings. 
Instead of selecting one overall best paper, we picked out four “distinguished” 
papers - though we were tempted to select at least 12 such papers. In addition 
we included 16 short papers in the proceedings - these were presented as posters 
at CP 2004. 

This volume includes summaries of the four invited talks of CP 2004. Two 
speakers from industry were invited. However these were no ordinary industrial 
representatives, but two of the leading researchers in the CP community: Helmut 
Simonis of Parc Technologies, until its recent takeover by Cisco Systems; and 
Jean Frangois Puget, Director of Optimization Technology at ILOG. The other 
two invited speakers are also big movers and shakers in the research community. 
We were delighted to welcome Bart Selman, previously at AT&T and now at 
Cornell, and Andreas Podelski, previously at Microsoft Research and now at the 
University of the Saarland. 

A doctoral program was again organized to expose students to CP 2004, 
and 22 doctoral presentations are summarized as 1-page papers in the proceed- 
ings. Michela Milano brought to the doctoral program all the energy, tact, and 
organization that the CP community has come to recognise in her. 

Finally, nine applications of CP were demonstrated at CP 2004, and 1-page 
descriptions of these demos have been included here. 

The day before CP 2004, nine workshops were held, each with their own pro- 
ceedings. Four tutorials were presented during the conference: “Modelling Prob- 
lems in Constraint Programming” by Jean-Clrar les Regin; “Online Stochastic 
Optimisation” by Pascal Van Hentenryck and Russell Bent; “Symmetry Break- 
ing in Constraint Programming” by Ian Gent and Jean-Frangois Puget; and 
“Distributed Constraints - Algorithms, Performance, Communication” by Am- 
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Preface 



non Meisels. Barry O’Sullivan brought this excellent program together with his 
unique combination of charm and competence. 

For conference publicity I’m very grateful for the hard work of Gilles Pesant, 
who took it on at the same time as moving his family across to Europe. He 
managed both with consummate efficiency. 

Many thanks to the program committee, who reviewed and discussed all the 
submissions, and got nothing for their efforts but a free lunch. Nevertheless PC 
members took an enormous amount of trouble and the PC meeting was intense 
but also a lot of fun. 

In preparing the proceedings, I’m grateful to Neil Yorke-Smitlr for generously 
volunteering to manage all the copyright forms, and Sevim Zongur who aided 
me in time of need. 

Peter Van Beek and Fahiem Bacchus were nothing short of marvellous. I 
dread to think how much time and trouble they spent on budgeting, planning, 
booking, covering up for me and making it all work. 

Finally heartfelt thanks to the many sponsors who make it possible for the 
conference to invite speakers, fund students and continue successfully for year af- 
ter year. We are very grateful to IISI, Cornell; AAAI; Parc Technologies; ILOG; 
SICS; CoLogNET; Microsoft Research; NICTA, Australia; 4C, Cork; Dash Op- 
timization; and the CPOC. 
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Constraints in Program Analysis 
and Verification 
(Abstract of Invited Talk) 



Andreas Podelski 

Max-Planck-Institut fur Informatik 
Saarbriicken, Germany 

Program verification is a classical research topic in core computer science. Re- 
cent developments have lead to push-button software verification tools that are 
industrially used e.g. to check interface specifications of device drivers. These 
developments are based on program analysis, model checking and constraint 
solving. 

This Talk. After a short introduction to classical program analysis, we will 
illustrate how constraint solving is used to overcome two problems in the use 
of abstraction for verifying properties of program executions. The first problem 
is infinite precision: how can we abstract away enough irrelevant details to be 
efficient, but keep enough details to preserve the property we want to prove? 
The second problem is infinite error length: how can one abstract a program so 
that it preserves liveness properties? 

Program Analysis. The term ‘program analysis’ refers to any method for ver- 
ifying execution properties (“dynamic properties”) that is or can be used in 
a compiler (“statically”). The original motivation has been compiler optimiza- 
tion. An optimization relies on a redundancy, which is a property that is valid 
throughout runtime. A typical verification problem is the question: “is the value 
of variable x always constant?” or: “is the value of variable x always different 
from 0?” The question is motivated by improving the efficiency respectively by 
avoiding an execution error. 

Program Analysis vs. Model Checking vs. Program Verification. To fix our termi- 
nology, we distinguish the three prominent approaches to verification by whether 
they are (1) automated and (2) whether they deal with general purpose and hence 
infinite programs (where infinite refers to the space of data values for program 
variables). Program analysis satisfies (1) and (2), model checking satisfies (1) 
but not (2), and program verification (by applying deductive proof rules and 
possibly checking proofs mechanically) satisfies (2) but not (1). Program anal- 
ysis uses abstraction to circumvent the manual construction of a finite model 
resp. the manual construction of an auxiliary assertion (such as an induction 
hypothesis). In fact, the methodic speciality of program analysis is abstraction. 
But what exactly is abstraction? 

Program Analysis in 2 Steps. To simplify, we divide program analysis into two 
steps: (1) the transformation of the program P into a finite abstract program P$, 
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and (2) the model checking of the property for P ** . The finiteness of P ** refers to 
the number of its ‘abstract’ states. 

The example program P shown below on the left (an extract from a Windows 
NT device driver) is transformed into the program shown below on the right. 
The states of P are partitioned according to the values of the expressions (z = 0) 
and (x = y), which is recorded by the Boolean variables 61 resp. 62 of P®. An 
abstract state is thus bitvector, which in fact stands for an equivalence class 
(consisting of all states that satisfy the predicates or their negation, according 
to the value of the bit for the predicate). 

Step (1) transforms each update statement in P to one in PK Step (2) follows 
all execution traces of P ** until its (finite) state space is exhaustively explored. 
Step (2) determines that Line 9 in P# can not be reached. This proves that P 
satisfies the correctness property (Line 9 in P can not be reached), because P 
simulates every possible execution of P. 



[1] 


do { 


[1] 


do 


[2] 


z = 0; 


[2] 


bl := 1; 


[3] 


x = y; 


[3] 


b2 := 1; 


[4] 


if (w){ 


[4] 


if (*) then 


[5] 


x++; 


[5] 


b2 : =if (b2) 


[6] 


z = 1 ; 


[6] 


bl : =0 ; 




} 




fi 


[7] 


} while (x!=y) 


[7] 


while ( b2 ) 


[8] 


if (z){ 


[8] 


if (!bl) then 


[9] 


assert (0) ; } 


[9] 


assert (0) ; 



First Problem: Infinite Precision. The abstraction is specified manually. I.e., 
the choice of the expressions ( z = 0) and (x = y) is based on judicious insights 
regarding the program and the property. That is, that choice must be done 
anew for each program and each property. Moreover, once the abstraction is 
fixed (by a choice of expressions), the abstract statement corresponding to each 
statement in P is specified manually (e.g. b2 := if (b2) then 0 else * for 
the concrete statement x++). This must be done anew for each abstraction. 

We take a more simple example to explain how constraint solving may come 
in to avoid the manual specification of an abstract statement. Classically, the 
table for the abstract addition of the three abstract values plus, zero, minus is 
specified manually. This is not necessary if one uses constraint solving to infer 
that 

x > 0, y > implies x + y > 0 

and so on for the other entries in the table for the abstract addition of plus, zero, 
minus. 

In the talk, we will explain the constraint-based construction of an abstract 
statement corresponding to each concrete statement in full generality. We will 
also explain how that construction lends itself to counterexample- guided abstrac- 
tion refinement , where expressions such as (z = 0) and (x = y) are thrown in 
incrementally in an iterative process (a process that is iterated until either a 
counterexample is found in the concrete program P or the property is proven 
correct). This solves the problem of infinite precision. 
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Second Problem: Infinite Error Length. Let us take an example of a concurrent 
program for which we want to establish a liveness property, here that every 
execution eventually reaches £4 (and thus terminates). To give a more interesting 
example of a liveness property, from an interface specification of a device driver: 
“each time a lock is acquired, it will be eventually returned” . The violation of 
a liveness property can be exhibited only on an infinite execution trace (and 
that is exactly the formal criterion that distinguishes it from a safety property). 
For liveness properties, one adds fairness assumptions to model that enabled 
transitions will eventually be taken (here, the transition from toq to mi). 



Pi :: 



lo : while x = 1 do 
£i:y :=y + l 
£2 : while y > 0 do 

4: y ■■= y - 1 

U: 



P2 :: 



mo : x := 0 

mi : 



Finite-State Abstraction Does Not Preserve Liveness Properties. We take the 
previous example program with two threads; the variable y is set to some ar- 
bitrarily large value n and is then continually decremented, O^n^n — 1 
. . . 1 0. No terminating finite-state system is an abstraction of this pro- 

gram. I.e., every sufficiently long computation of the concrete program (with 
length greater than the number of abstract states) will result in a computation 
of the abstract system that contains a loop. 

We thus need to come up with an extension of the idea of abstract states and 
their representation by constraints: abstract transitions and their representation 
by transition constraints. 

Transition Constraints. An update statement such as x : =x-l corresponds to the 
transition constraint x' = x — 1. A state s leads to state s' under the execution 
of the update statement if and only if the pair of states defines a solution for 
the corresponding transition constraint (where s and s' define the values of the 
unprimed respectictively primed version of the program variables) . 

An if statement corresponds to a transition constraint that contains the if 
expression as a conjunct (a constraint, i.e. the special case of a transition con- 
straint over unprimed variables). For example, if (x>0) {x : =x-l} corresponds to 
pc = LO A x > 0 A x' = x — 1 A pc' — LO. 

The program statement LO: while (x>0) {x : =x-l} LI: (at program label 

LO, with LI in the next line) corresponds to the set (disjunction) of two transition 
constraints ci and C 2 below. 

ci= pc = L0/\x>0Ax'=x — 1A pc' = LO 

02 = pc = LO A x < 0 A x' = x A pc' = LI 

That is, a program corresponds to a set of transition constraints. A program 
states the relation between pre and post states. Hoare-style reasoning for partial 
and total correctness, and deductive verification and model checking of safety 
and liveness properties of (sequential, recursive, concurrent, . . . ) programs can 
be transferred to reasoning over transition constraints. 
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The sequential composition of statements Si ; £2 corresponds to the logical 
operation o over the corresponding transition constraints cs 1 and cs 2 , an opera- 
tion that implements the relational composition of the denoted binary relations 
over states. We illustrate the operation o on the transition constraints ci and C 2 
introduced above. 

cioci = pc = LOAa;>lAa; / = a; — 2A pc' = LO 
Ci o ci o C 2 = pc = LO A x = 2 A x' = x — 2 A pc' = LI 



Abstract Transitions. We used a fixed finite set of expressions over states (which 
are constraints in unprimed variables) to partition the state space into finitely 
many equivalence classes; those were treated as abstract states. We now use a 
fixed finite set of ‘atomic’ transition constraints (which are constraints in un- 
primed and primed variables) to partition the space of state pairs into finitely 
many equivalence classes; we treat those equivalence classes as abstract transi- 
tions c#. For example, taking the set {x > 0, x' < x — 1}, we can abstract the 
set of all pairs (s, s') such that s leads to s' in an arbitrarily long sequence of 
transitions by the two abstract transitions df and cf below. 

cf = pc = LO A x > 0 A x' < x — 1 A pc' = LO 

cf = pc = LO A x < 0 A x' < x — 1 A pc' = LI 

A transition invariant T is a set of transition constraints that contains the tran- 
sitive closure of the binary transition relation of a program. Partial correctness 
can be proven via the restriction of T to the entry and exit points of a program. 
Termination can be shown via the well-foundedness of each single transition 
constraint in T, which corresponds to the termination of a corresponding single- 
while loop program and can be tested very efficiently (e.g. by a reduction, based 
on Farkas’ Lemma, to a constraint solving problem). General safety and livess 
properties can be handled in a similar manner. 

The construction of a transition invariant starts with the transition con- 
straints Ci that correspond to the single statements of the program; iteratively, 
it takes each (new) transition constraint T in , composes T with each of the Cj’s, 
approximates the result T o a by an abstract transition c#, and adds c# to T; 
it terminates when no new abstract transitions are added. 

Infinite Dimensions in Program Analysis. Constraints and logical operations 
over constraints as effective means to represent and manipulate infinite sets and 
relations, have been used to overcome the problem of infinite precision and the 
problem of infinite error length. There is quite a number of infinite dimensions 
in program analysis (local variables, memory cells, objects, abstract data types, 
control points, threads, messages, communication channels, . . . ) that still need 
to be addressed. Probably that number itself is infinite. 
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1 Introduction 

Constraint Programming (CP) is a healthy research area in the academic com- 
munity. The growing number of participants to the CP conference series, as well 
as the number of workshops around CP is a good evidence of it. Many major 
conferences have a CP track, both in artificial intelligence, and in operations 
research. The existence of several commercial companies that offer CP tools and 
services is a further evidence of the value of CP as an industrial technology. ILOG 
is one of such companies. One of our uniqueness, as far as CP is concerned, is 
that the research and development team that produces our CP products is also 
responsible for the development of our mathematical programming (MP) tool, 
namely ILOG CPLEX. This provides a unique opportunity to contrast the way 
these products are developed, marketed and used. 

In this paper we argue that current CP technology is much too complex to use 
for the average engineer. Worse, we believe that much of the research occurring 
in the CP academic community makes this even worse every year. The rest of 
the paper provides evidence for this claim, and suggests ways to address the 
issue of simplicity of use by looking at how a similar issue has been addressed 
in the mathematical programming community. 

2 A Comparison Between Math Programming 
and Constraint Programming 

A technical comparison between mathematical programming and constraint pro- 
gramming shows many similarities [8] . If we look at it from the angle of an indus- 
try user, tlreu the two approaches also look very similar. For instance, problems 
are modeled in a very similar way, using variables, constraints, and an objective 
function. There are differences though. For instance, the set of constraint types 
that can be used is usually restricted in math programming to linear or quadratic 
constraints. CP systems usually support a much richer set of constraint types. 
There are also differences in the algorithms used under the hood to solve a prob- 
lem once it is modeled. However, our point is that a typical industry user is not 
interested in understanding the differences in the way solutions are computed 
with a CP system as opposed to the way they are computed with an MP sys- 
tem. What matters is to know if one technology is applicable to the problem at 
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hand, and, subsequently, how to best apply it. In this respect MP seems quite 
appealing. 

Indeed, the paradigm of mathematical programming tools can be concisely 
defined as: “model and run”. This means that the main thing one has to worry 
about is how the problem is specified in terms of variables, constraints and 
objective functions. There is no need to deal with how the search for solution 
should proceed for instance. 

Let us contrast this with a typical CP application development. First of all, 
modeling is still a key component here. The problem must be stated in terms 
of variables, constraints and objective function if any. But, the richer set of 
modeling constructs available in a CP system creates a much larger of alternative 
formulations for a given problem. The choice of the right model requires an 
understanding of how the underlying algorithms work. For instance, it is usually 
interesting to use global constraints. This can induce significant changes in the 
way a problem can be modeled in CP. The second very important ingredient in 
a CP solution is the search part. Should the traditional tree search be used, or 
should a local search, or even a hybrid approach be selected? In the case of a 
tree search, the ordering in which decisions are explored is extremely important. 
Designing an ordering that is both robust and efficient can be challenging. A 
third ingredient can be the design of ad hoc global constraints, which again 
requires some expertise. One can argue that the variety of options allowed by 
CP is an asset [12]. This is true, provided the user is a very experienced CP user 
that understands many of those options in great detail. 

The perceived value is the exact opposite for users that have no background in 
CP. The time needed to get an experience of a significant fraction of the available 
CP tools and methods may just be too important given the time allocated for 
developing a software solution. 



3 Mathematical Programming at Work 

The “model and run” paradigm has far reaching consequences. One can focus 
on the modeling part, regardless of which particular software will be used to 
solve that model. There exists modeling tools such as AMPL[1], GAMS [5], or 
OPL Studio[6][ll] that let users create a model of the problem they want to 
solve independently of the MP software that will be used to solve the model. 
The existence of these modeling tools is possible thanks to the existence of a 
standard file format for expressing mathematical problems: the MPS format. 
Standardization is in turn very interesting for business users, because they can 
develop models without being tied to a particular software vendor. On the con- 
trary, models written with one particular CP tool cannot usually be solved with 
another tool unless some significant rewriting occurs. 

The identification of modeling as the main activity of interest has spurred the 
publication of books that focus on modeling, see [13] for instance. There is no 
description of algorithms in such books. Rather; these books are cook books for 
problem solving. Let us contrast this with CP books such as [11] [10] [9] [2] [7] [4] [3] . 
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These books either focus on the description algorithms used to solve CSPs, or 
on the design of CP languages, or both. Some of them may also contain sections 
devoted to the modeling activity, but this is never their core. 

Another consequence of the “model and run” paradigm is that MP systems 
are designed to support it. This means that typical MP systems such as ILOG 
CPLEX can read a model from a file, and can compute solutions to it without 
requiring any additional input 1 . On the contrary, CP systems are developments 
toolkits 2 with which one has to build an algorithm that solves a given problem. 
One could say that CP systems are not ready to use, whereas MP systems look 
more like a turnkey application. 

Yet another consequence of the “model and run” paradigm is that the lan- 
guage in which models are stated is fixed in time. This means that improvements 
in the way models can be solved do not require model transformations. On the 
contrary, the typical way a given CP system is enhanced is by adding features to 
it. Let us just give some examples here. One typically adds new global constraints 
in order to benefit from improved domain reduction algorithms. One usually in- 
troduces new search constructs in order to support new search paradigms, or new 
hybridizations. A more specific example is symmetry breaking, where symme- 
tries need to be provided as input. In order to benefit from these enhancements, 
models must be rewritten or augmented. One has to introduce statements that 
add global constraints for instance. This way of augmenting CP in order to 
improve its problem solving capacity permeates the academic research as well. 
Improvements are very often described in term of new global constraints, new 
modeling constructs, or new primitive and operators to program search. 



4 A “Model and Run” Paradigm for CP 

One way of decreasing the complexity of current CP systems, tools and tech- 
niques, it is to mimic what has been done in the MP community over the years. 
We propose to the CP community the following challenge: try to develop a 
“model and run” paradigm for CP. This means that the following items should 
be developed: 

— A standard file format for expressing CP models. 

— A library of CP algorithms that can get models as input, and produces 
solutions as output, without requiring additional inputs. 

— Books that focus on modeling. 

— Software improvements that do not require model rewriting. 

1 Tuning and more advanced interaction with MP algorithms is also available, but 
these advanced features are usually used by experienced users and academics. They 
are not used by typical industry engineers. 

2 ILOG provides software libraries, hence the analogy with a toolkit. Other vendors 
provide CP programming languages, usually in the form of specialized versions of 
the PROLOG language. To a great extent the argument we make is valid for both 
the library and the languages approaches. 
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We believe that the above items generate interesting research problems beyond 
the current ones actively pursued by the academic community. We further believe 
it is the long term interest of the academic community to make sure that CP 
technology can be used in the industry at large. 
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The challenge to solve worst-case intractable computational problems lies at the 
core of much of the work in the constraint programming community. The tradi- 
tional approach in computer science towards hard computational tasks is to iden- 
tify subclasses of problems with interesting, tractable structure. Linear program- 
ming and network flow problems are notable examples of such well structured 
classes. Propositional Horn theories are also a good example from the domain of 
logical inference. However, it has become clear that many real-world problem do- 
mains cannot be modeled adequately in such well-defined tractable formalisms. 
Instead richer, worst-case intractable formalisms are required. For example, plan- 
ning problems can be captured in general propositional theories and related con- 
straint formalisms and many hardware and software verification problems can 
similarly be reduced to Boolean satisfiability problems. Despite the use of such 
inherently worst-case intractable representations, ever larger real-world problem 
instances are now being solved quite effectively. Recent state-of-the-art satis- 
fiability (SAT) and constraint solvers can handle hand-crafted instances with 
hundreds of thousands of variables and constraints. This strongly suggests that 
worst-case complexity is only part of the story. I will discuss how notions of 
typical case and average case complexity can lead to more refined insights into 
the study and design of algorithms for handling real-world computationally hard 
problems. We will see that such insights result from a cross-fertilization of ideas 
from different communities, in particular, statistical physics, computer science, 
and combinatorics. 

Typical Case Complexity and the Role of Tractable Problem Structure. 

The key to handling intractability is our ability to capture and exploit problem 
structure, a way of taming computational complexity. In general, however, the 
notion of structure is very hard to define. Recently, researchers have made con- 
siderable strides in correlating structural features of problems with typical case 
complexity. In particular, the study of phase transition phenomena is an emerg- 
ing area of research that is changing the way we characterize the computational 

* Work supported in part by the Intelligent Information Systems Institute at Cornell 
University sponsored by AFOSR (F49620-01-1-0076) and an NSF ITR grant (IIS- 
0312910). 
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Fig. 1. Median cost of solving a 2+p-SAT problem for different values of p. We observe 
linear typical case scaling for a fraction of 3-SAT structure less than 0.4. 

complexity of NP-Hard problems, beyond the worst-case complexity notion: Us- 
ing tools from statistical physics we are now able to provide a finer characteriza- 
tion of the spectrum of computational complexity of instances of NP-Complete 
problems, identifying typical easy-hard-easy patterns as a function of the ra- 
tio of variables to constraints [1-3]. An interesting related structural concept 
involves the characterization of the complexity of a problem in the presence of 
tractable (sub)components. Monasson et al. [4, 5] introduced the 2+p-SAT prob- 
lem to study the behavior of problems that are a mixture of 2-SAT and 3-SAT 
clauses. The fraction of 3-SAT clauses is defined by a parameter p (0 < p < 1). 
This hybrid problem is NP-complete for p > 0. However, somewhat surprisingly, 
Monasson et al. showed that the typical case complexity of the problem scales 
linearly as long as the fraction of 3-SAT clauses is below 0.4. See figure 1. This 
is a promising result, suggesting that real-world instances of NP-complete prob- 
lems may behave in a tractable way as long as they contain a reasonable amount 
of tractable substructure. Finding ways to exploit tractable substructure is very 
much aligned with work in the constraint programming community where one 
relies on special structure captured by global tractable constraints, such as the 
alldiff constraint. 

Characterization Hidden Problem Structure. 

In recent work, we have pursued a more general characterization of tractable 
substructure to explain the wide range of solution times - from very short to 
extremely long runs - observed in backtrack search methods, often characterized 
by so-called heavy-tailed distributions [6] . We introduced the notion of a special 
subset of variables, called the backdoor variables [7]. A set of variables forms a 
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Fig. 2. A visualization of a SAT-encoded logistics planning instance (logistics. b.cnf, 
containing 843 vars and 7301 clauses). Each variable is represented as a node, and 
variables belonging to the same clause are connected by an edge. At the top: the 
original logistics formula. Bottom left: simplified instance after setting 5 backdoor 
variables. Bottom right: simplified instance after setting 12 backdoor variables. We 
observe that the formula is dramatically simplified after setting just a few backdoor 
variables. (Figure by Anand Kapur.) 

backdoor for a problem instance if there is a value assignment to these variables 
such that the simplified instance can be solved in polynomial time by propagation 
and simplification mechanisms. Another way of stating this is to say that after 
setting the backdoor variables the simplified instance falls in a polynomially 
solvable class. Note however that we do not require for this class to have a clear 
syntactic characterization. 

Structured problem instances can have surprisingly small sets of backdoor 
variables. When considering SAT encodings of logistics planning problems, we 
found that, for example, the logistics-b planning problem instance has a backdoor 
set of 15 variables, compared to a total of over 800 variables in the formula. 
Figure 2 provides a visualization of the logistics-b planning problem instance. 
We have found similarly small backdoors for other structured problems instances, 
such as those from bounded model-checking domains. 

Of course, in practice, the small backdoor set (if it exists) still needs to be 
uncovered by the solver itself. In [7], we show that even when taking into ac- 
count the cost of searching for backdoor variables, one can still obtain an overall 
computational advantage by focusing in on a backdoor set, provided the set is 
sufficiently small. Heuristics, incorporated in many current CSP/SAT solvers, 
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also implicitly search for backdoor variables, by uncovering those variables that 
cause a large number of unit-propagation. 

Survey Propagation and Random Walks 

An exciting recent development is the discovery of a completely new class of sat- 
isfiability and constraint solving methods that effectively solve very large (sat- 
isfiable) instances near phase transition boundaries [8]. The approach is called 
survey propagation, and is based on advanced techniques from statistical physics 
used to study properties of disordered system. Survey propagation is an extended 
form of belief propagation. At an intuitive level, the method estimates the prob- 
ability that a given variable has a certain truth value in the set of satisfying 
solutions. The method incrementally sets variables to their most likely values. 
After each variable setting, propagation is used to simplify the problem instance 
and new probability estimates are computed. The strategy is surprisingly effec- 
tive - random 3-SAT instances with up to 10 7 variables very near the phase 
transition boundary can be solved. This is an improvement of almost two orders 
of magnitude over the previous best approach on hard random instances based 
on biased random walks (WalkSAT) [9]. However, WalkSAT and related meth- 
ods are much more widely applicable and more robust on structured problems. 
The challenge is to adapt survey propagation methods for structured problems 
instances. One potential strategy is to use biased random walks to sample from 
near-solutions [10] as a complementary method for estimating probabilities for 
variable settings. 
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Abstract. In this talk we present a number of problems for network 
design, planning and analysis and show how they can be addressed with 
different hybrid CP solutions. Clearly, this problem domain is of huge 
practical importance, but it also provides us with interesting, complex 
problem structures. CP directly competes with MILP and local search 
approaches to these problems, with best results often obtained by a com- 
bination of different solution techniques. Teams at Parc Technologies and 
IC-Parc have been working in this field over the last years, with a number 
of applications now embedded in commercial products. 



1 Introduction 

In recent years computer networks have become ubiquitous, they are now part 
of everyday life. This has caused a rapid growth in traffic volume, but has also 
increased our dependency on their undisturbed operation. The current move to- 
ward ‘converged’ networks, which combine both connection-less (Internet) and 
connection-based (voice, video) traffic in a single IP network environment, in- 
creases the demand for reliable but cost-effective network services. Constraint 
programming can help to provide software tools for various aspects of network 
design and operations. In this talk we show five areas where constraint tech- 
niques are already used, most often in the form of hybrid solvers, combining 
constraints with LP or local search methods. 

2 Network Design 

In its simplest form, network design consists in selecting a capacitated topology 
of nodes and links which can transport predicted demands between customers. 
Links typically can only be chosen between certain locations and from few, prede- 
fined capacity types. The objective is to minimize investment cost, while allowing 
enough spare capacity for robust operation. Different solution approaches were 

* Part of this work was done while the author was working for Parc Technologies Ltd. 
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compared in [1], and a branch and price scheme was presented in [2]. But the 
design problem has many variations, for example allowing multi-cast traffic [3], 
while optical network design may use a very different model [4]. 



3 Traffic Engineering 

Traditionally, IP networks relied on destination-based routing to forward pack- 
ets using routing protocols like OSPF. In this approach the shortest path (wrt 
link weights) between source and destination is used to transport packets. This 
can lead to bottlenecks in the network when multiple shortest paths use the 
same link. Traffic engineering (TE) tries to overcome this problem by permitting 
explicit paths for each source and destination pair. Choosing these paths taking 
connectivity and capacity constraints into account allows to spread the traffic 
over the whole network, removing bottlenecks in utilization. There are three 
main models for expressing TE problems: 

— Link based models use 0/1 integer variables to express whether a demand is 
routed over a link or not. The model grows cubically with the network size, 
which makes a direct MILP solution impractical for large networks. 

— Path based models choose some of the possible paths for each demand and 
select a combination of the possible paths which satisfies the capacity con- 
straints. This approach lends itself to a column generation method with a 
branch and price scheme to generate new paths on demand as used in [2] . 

— Node based models use a decision variable for each node and demand, which 
indicates the next-hop for the demand. This model can be expressed with 
traditional finite domain constraints [5]. 

In recent years, many different techniques have been proposed to solve these 
problems. Hybrid models using constraints include Lagrangian relaxation [6,7], 
local probing [8,9], probe backtracking [10] and local search hybrids [11]. A 
decomposition method was introduced by [12]. 



4 Deducing the Demand Matrix 

The models for traffic engineering and network design all rely on an accurate 
demand matrix, which describes the traffic size between nodes in the network. 
This matrix is surprisingly difficult to obtain. In traditional, connection-based 
networks this data is readily available by design, but IP networks typically only 
collect aggregate link traffic counters. The task is complicated by the fact that 
counter collection may be incomplete and inconsistent. Deducing the traffic ma- 
trix from link counters is the problem of traffic estimation [13], which can be 
seen as a constraint problem with incomplete and incorrect data [14, 15]. 
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5 Network Resilience 

The failure of a single network element (link, router) should have minimal impact 
on the operation of the network. Depending on the network technology used, we 
can provide different solutions for improved network resilience: 

— In destination-based routed networks, we can predict the utilization of the 
network under element failure based only on current link traffic counters. 
This is achieved by a combination of bounds reasoning with a linear traffic 
model [16]. 

— In MPLS networks, it is possible to automatically reroute important traffic 
around an element failure, a technique called fast re-route. Parc Technologies 
has provided a solution for Cisco’s TimnelBuilder Pro to automatically find 
such detours. This problem is related to the traffic bypass problem described 
in [17]. 

— For traffic engineered networks, we can provide secondary paths, which are 
node or link disjoint from the primary paths chosen. We can use these sec- 
ondaries if one of the elements on the primary path fails. Depending on the 
capacity constraints used, this can lead to very challenging models. 

6 Bandwidth-on-Demand 

Traffic engineering considers the impact of a set of demands on the current 
network, all demands are active at the same time. We can generalize this concept 
where demands have fixed start and end times, and compete for resources only 
if they overlap in time. This is the problem of Bandwidth-on-Demand , where 
customers can reserve network capacity for future time periods, for example a 
video conference between multiple sites with guaranteed throughput and quality 
of service. The model of [5] extends to this case, an alternative, repair-based 
model for this problem has been proposed in [18, 19] in the context of an ATM 
network. 

Parc Technologies and IC-Parc have developed a Bandwidth-on-Demand 
(BoD) system for Schlumberger’s dexa.net, a global MPLS network providing 
services in the oil- field sector. This network combines traffic engineering, multi- 
ple classes of service and rate limiting at the ingress points to guarantee delivery 
of BoD requests without disrupting existing traffic. 
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Abstract. Existing random models for the constraint satisfaction prob- 
lem (CSP) all require an extremely low constraint tightness in order to 
have non-trivial threshold behaviors and guaranteed hard instances at 
the threshold. We study the possibility of designing random CSP mod- 
els that have interesting threshold and typical-case complexity behaviors 
while at the same time, allow a much higher constraint tightness. We 
show that random CSP models that enforce the constraint consistency 
have guaranteed exponential resolution complexity without putting much 
restriction on the constraint tightness. A new random CSP model is pro- 
posed to generate random CSPs with a high tightness whose instances 
are always consistent. Initial experimental results are also reported to 
illustrate the sensitivity of instance hardness to the constraint tightness 
in classical CSP models and to evaluate the proposed new random CSP 
model. 



1 Introduction 

One of the most significant problems with the existing random CSP models is 
that as a model parameter, the constraint tightness has to be extremely low in 
order to have non-trivial threshold behaviors and guaranteed hard instances at 
phase transitions. In [1,2], it was shown that except for a small range of the 
constraint tightness, all of the four classical random CSP models are trivially 
unsatisfiable with high probability due to the existence of the flawed variables. 
For the case of binary CSPs, the constraint tightness has to be less than the 
domain size in order to avoid the flawed variables. Recent theoretical results 
in [3,4] further indicate that even for a moderate constraint tightness, it is 
still possible for these classical models to have an asymptotically polynomial 
complexity due to the appearance of embedded easy subproblems. 

Several new models have been proposed to overcome the trivial unsatisfi- 
ability. In [2], Gent et al proposed a CSP model, called the flawless random 
binary CSP, that is based on the notion of a flawless conflict matrix. Instances 
of the flawless random CSP model are guaranteed to be arc-consistent, and 
thus do not suffer asymptotically from the problem of flawed variables. In [1], 
a nogoods-based CSP model was proposed and was shown to have non-trivial 
asymptotic behaviors. Random CSP models with a (slowly) increasing domain 
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size have also been shown to be free from the problem of flawed variables and 
have interesting threshold behaviors [5,6]. However, none of these models have 
addressed the fundamental requirement of an extremely low constraint tightness 
in order to have a guaranteed exponential complexity. The flawless random CSP 
does have a true solubility phase transition at a high constraint tightness, but 
as we will show later, it still suffers from the embedded easy unsatisfiable sub- 
problems at a moderate constraint tightness. In CSP models with an increasing 
domain size, the (relative) constraint tightness should still remain low. In the 
nogood-based CSP model, it is impossible to have a high constraint tightness 
without making the constraint (lryper)graplr very dense. 

In this paper, we study the possibility of designing non-trivial random CSP 
models that allow a much higher constraint tightness. For this purpose, we show 
that consistency, a notion that has been developed to improve the efficiency 
of CSP algorithms, is in fact the key to the design of random CSP models that 
have guaranteed exponential resolution complexity without the requirement of an 
extremely low constraint tightness. We propose a scheme to generate consistent 
random instances of CSPs that can potentially have a high constraint tightness. 
Initial experiments show that the instances generated by our model are indeed 
much harder at the phase transition than those from the classical CSP models 
and the flawless CSP models. 

2 Random Models for CSPs 

Throughout this paper, we consider binary CSPs defined on a domain D with 
|Z)| = d. A binary CSP C consists of a set of variables x = {x\, • • • ,x n } and 
a set of binary constraints (C i, • • • , C m ). Each constraint C, is specified by its 
constraint scope, a pair of the variables in x, and a constraint relation Rci that 
defines a set of incompatible value-tuples in D x D for the scope variables. An 
incompatible value tuple is also called a restriction. Associated with a binary 
CSP is a constraint graph whose vertices correspond to the set of variables and 
edges correspond to the set of constraint scopes. In the rest of the paper, we will 
be using the following notation: 

1. n, the number of variables; m, the number of constraints; 

2. d, the domain size; and t, the constraint tightness, i.e. , the size of the re- 
striction set. 

Given two variables, their constraint relation can be specified by a 0-1 matrix, 
called the conflict matrix, where an entry 0 at (i, j) indicates that the tuple 
(*,j) e D x D is incompatible. Another way to describe the constraint relation 
is to use the compatible graph , a bipartite graph with the domain of each variable 
as an independent partite, where an edge signifies the corresponding value-tuple 
is compatible. 

An instance of a CSP is said to be k-consistent if and only if for any (k-1) 
variables, each consistent (k-l)-tuple assignment to the (k-1) variables can be 
extended to an assignment to any other kth variable such that the k-tuple is also 
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consistent. A CSP instance is called strongly k-consistent if and only if it is j- 
consistent for each j < k. Of special interest are the strong k-consistency for k = 
1,2,3, also known as node-consistency, arc-consistency, and path-consistency. 

Definition 1 (Random Binary CSP B n’m)- Let 0 < t < d 2 be an integer. 
B^’h j is a random CSP model such that 

1. its constraint graph is the standard random graph G(n,m ,) where m edges of 
the graph are selected uniformly from all the possible Q) edges; and 

2. for each of the edges of G, a constraint relation on the corresponding scope 
variables is specified by choosing t value-tuples from D x D uniformly as its 
restriction set. 

B^'ln is known in the literature as the Model B. It has been shown in [1, 
2] that for t > d, B^ m is asymptotically trivial and unsatisfiable, and has a 
phase transition in satisfiability probability for t < d. This motivates the intro- 
duction of the flawless conflict matrix to make sure that the random model is 
arc-consistent [2]. 

Definition 2. (£^’^[1], Flawless Random Binary CSP). In the flawless 
random binary CSP B'f'^l], the constraint graph is defined in the same way as 
that in B^'m- For each constraint edge , the constraint relation is specified in two 
steps: 

1. Choosing a random permutation it of D = {1, ■ • • d}; and 

2. Selecting a set of t value-tuples uniformly from D x £>\{(*,7t(?;)),1 < i < n)} 
as the restriction set. 

The reason that we use a suffix “[1]” in the symbol £?„„[!] will become clear 
after we introduce the generalized flawless random CSPs later in this paper. By 
specifying a set of tuples {(*, 7 r(?')), 1 < i < n)} that will not be considered when 
choosing incompatible value-tuples, the resulting model is guaranteed to be arc- 
consistent and consequently will not have flawed variables. However, even though 
the flawless random binary CSP B^^ [1] does not suffer the problem of trivial 
unsatisfiability, it can be shown that £>J^[1] asymptotically has embedded easy 
subproblems for t > d — 1 in the same way as the random binary CSP model. 

Theorem 1. For t > d—1, there is a constant c* > 0 such that for any ^ > c* , 
with high probability [1] is asymptotically unsatisfiable and can be solved in 
polynomial time. 

A detailed proof outline of Theorem 1 can be found in the Appendix, Section 
6.1. The idea is to show that for ^ > c*, the flawless random CSP B r G jn [l\ will 
with high probability contain an unsatisfiable subproblem called an r—flower. 
The definition of an r—flower can be found in the Appendix. Furthermore, if a 
binary CSP instance contains an r—flower, then any path-consistency algorithm 
will produce a new CSP instance in which the center variable of the r—flower 
has an empty domain. It follows that we can prove that it is unsatisfiable poly- 
nomially. 
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It should be noted that does have a non-trivial phase transition since 

it is satisfiable with high probability if — < tj. Theorem 1 does not exclude 
the possibility that B [1] will also be able to generate hard instances when ^ 
is below the upper bound c*, in particular in the case of a large domain size. 
Further investigation is required to fully understand the complexity of £>„ f m [l] 
in this regard. 

3 Consistency, Resolution Complexity, 
and Better Random CSP Models 

Propositional resolution complexity deals with the minimum length of resolution 
proofs for an (unsatisfiable) CNF formula. As many backtracking-style complete 
algorithms can be simulated by a resolution proof, the resolution complexity 
provides an immediate lower bound on the running time of these algorithms. 
Since the work of Chvatal and Szemeredi [7], there have been many studies on 
the resolution complexity of randomly generated CNF formulas [8,9]. 

Mitchell [10] developed a framework in which the notion of resolution com- 
plexity is generalized to CSPs and the resolution complexity of randomly gen- 
erated CSPs can be studied. In this framework, the resolution complexity of 
a CSP instance is defined to be the resolution complexity of a natural CNF 
encoding which we give below. Given an instance of a CSP on a set of vari- 
ables {x\, • • • , x n } with a domain D = {1, 2, • • • , d}, its CNF encoding is con- 
structed as follows: (1) For each variable Xi, there are d Boolean variables 
Xi : 1, Xi : 2, • • • , and Xi : d, each of which indicates whether or not Xi takes on 
the corresponding domain value; and there is a clause Xi : 1 V Xi : 2 V • • • V Xi : d 
on these d Boolean variables making sure that Xi takes at least one of the 
domain values; (2) For each restriction Si,-- - ,6k G D k of each constraint 
C(xi 1 , • • • , Xi k ), there is a clause x ^ : 6 1 V • • • V Xi k : 5k to respect the restriction. 

In [10, 4], upper bounds on the constraint tightness t were established for the 
random CSPs to have an exponential resolution complexity. For random binary 
CSP , the bound is (1) t < d— 1; or (2) t < d and ^ is sufficiently small. For 
a moderate constraint tightness, recent theoretical results in [3, 4] indicate that 
it is still possible for these classical models to have an asymptotical polynomial 
complexity due to the existence of embedded easy subproblems. The primary 
reason for the existence of embedded easy subproblems is that for a moderate 
constraint tightness, constraints frequently imply forcers which force a pair of 
involved variables to take on a single value-tuple. 

In the following, we will show that it is not necessary to put restrictions on 
the constraint tightness in order to have a guaranteed exponential resolution 
complexity. Based on quite similar arguments as those in [10,4,11], it can be 
shown that if in B^’^, the constraint relation of each constraint were chosen in 
such a way that the resulting instances are always strongly k-consistent (k > 3), 
then B ,l n t m has an exponential resolution complexity no matter how large the 
constraint tightness is. 
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Theorem 2. Let B^ m [SC\ be a random, CSP such that (1) its constraint graph 
is the standard random graph G(n, m); and (2) for each edge, the constraint 
relation is such that any instances of B^^fSC] is strongly k-consistent. Then, 
the resolution complexity of B^ m [SC\ is almost surely exponential. 

Proof. See Appendix. 

Using the tool developed in [4], the requirement of strong k-consistency for 
CSP instances to have an exponential resolution complexity can be further re- 
laxed. We call a CSP instance weakly path-consistent if it is arc-consistent and 
satisfies the conditions of path-consistency for paths of length 3 or more. 

Theorem 3. Let B^m \WC\ be a random CSP such that (1) its constraint graph 
is the random graph G(n, m); and (2) for each edge, the constraint relation 
is such that any instances of B^* m [WC] are weakly path- consistent. Then, the 
resolution complexity of B^’^WC] is almost surely exponential. 

Proof. See Appendix. 

The question remaining to be answered is whether or not there are any 
natural random CSP models that are guaranteed to be strongly k-consistent or 
weakly path-consistent. In fact, the CSP-encoding of random graph k-coloring 
problem is strongly k-consistent. Another example is the flawless random binary 
CSP 1] that is guaranteed to be arc-consistent, i.e., strongly 2-consistent. 
In the rest of this section, we discuss how to generate random CSPs with a high 
tightness that are strongly 3-consistent or weakly path-consistent. 

Definition 3. Generalized Flawless Random Binary CSP). In 

the generalized flawless random binary CSP K is a random bipartite 

graph with each partite being the domain D of a variable. The constraint graph is 
defined in the same way as that in Bfy^. For each constraint edge, the constraint 
relation is specified as follows: 

1. Generate the bipartite graph K, satisfying certain properties; and 

2. Select a set oft value-tuples uniformly from ( D x D)\E(IC) as the restriction 
set. 

The idea in the generalized flawless random binary CSP is that by enforcing 
a subset of value-tuples (specified by the edges of the bipartite graph K.) to be 
always compatible, it is possible that the generated CSP instance will always 
satisfy a certain level of consistency. If we define K, to be a 1-regular bipartite 
graph, then reduces to the flawless random binary CSP model !]• 

The following result shows that a connected and /-regular bipartite graph K, 
with sufficiently large l can be used to generate strongly 3-consistent random 
CSPs or weakly path-consistent random CSPs. 

Theorem 4. Let K, be an l-regular connected random bipartite graph. Then, 
BnmVQ always 
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1. strongly 3-consistent if and only if l > and 

2 . weakly path- consistent if and only if l > 

Proof. We only prove the case for weak path-consistency; the case for the strong 
3-consistency is similar. 

Consider a path X\ — £ 2 — £3 — X 4 and any assignment x\ = i and X 4 = j. 
There are l values of £ 2 that are compatible to x\ = i and there are l values of £3 
that are compatible to £4 = j. Since the conflict graph is connected, there are at 
least l + 1 values of £3 that are compatible to £1 = i. Therefore if l > (d — l)/2, 
there must be a value of £3 that is compatible to both x\ = i and £4 = j. 

To see the “only if’ part, we will show that there is a connected bipartite 
graph K(V, U ) on two sets of vertices V = {i>i, v%, ■ ■ ■ , Vd} and U = {i>i, U 2 , • • • , 
Ud} such that the neighbors of the first l vertices in V are the first l + 1 ver- 
tices in V. First, we construct a complete bipartite graph on the vertex sets 
{^i, V 2 , • • • ,vij and {tti, u 2 , • • • , u/}; Second, we construct an Z-regular connected 
bipartite graph on the vertex sets {r^+i, • • • , Vdj and {u/+i, • • • , Udj such that 
(£;+i,w;+i) is an edge. We then replace the two edges ( vi,ui ) and (£/+i, wj+i) 
with two new edges (vi,ui+ 1 ) and (£/ + i,u;). This gives the bipartite graph 

K(V,U). □ 

The generalized random CSP model with a connected regular bi- 

partite K, allows a constraint tightness up to d+ ^' ,d . The above theorem also 
indicates that this is the best possible constraint tightness when using an arbi- 
trary connected bipartite graph 1C. To achieve higher constraint tightness, we 
propose a recursive scheme to generate a bipartite graph 1C that is more efficient 
in its use of edges. 

Definition 4 (Consistency Core). Let D\ = D 2 be the domains of two vari- 
ables with |Di| = |Z) 2 | = d. The consistency core for the domains D\ and D 2 
is a bipartite graph Gcore{Di, -D 2 ) on D\ and D 2 , and is defined recursively as 
follows. 

1. Let { Dij , 1 < j < s} be a partition of Di such that \Dij\ > 3. 

2. If s < 3, Gcore{Di, Df) is equal to an lo-regidar connected bipartite graph on 
-Di(tti) = {tti (1) , - • • , 7 Ti (rf) } and A>( 7 r 2 ) = { 7 r 2 (l),--< ,ir 2 (cf)} where 7 Ti, 7 t 2 
are two permutations of { 1 , 2 , • • • , d} and Iq > 

3. For s > 3, let tti,tt 2 be two permutations of S = {1, 2, • • • , s} and 

S(n 2 ), l) 

be an l-regular connected bipartite graph on S(ir 1 ) = { 7 Ti (1) , • • • , 7 Ti(s)} and 
3 (^ 2 ) = { 7 t 2 ( 1 ) , • • • , 7 r 2 (s)}. The edge set of G CO re{.Di,D 2 ) is defined to be 
the union of the edge sets of all consistency cores Gcore{Dn, Dij) where i 
and j are integers such that ( i,j ) € S(tt 2 ), l). 

Theorem 5. If a consistency core is used for 1C, then 13 j,'l n [IC\ is 

1. strongly 3-consistent if and only if l > and 

2 . weakly path- consistent if and only if l > 
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Proof. By induction on the domain size and using the previous theorem. 

Using the consistency core, we can define random CSP models with con- 
straint tightness well above For example, if the domain size d is 12, the 

random generalized random CSP model B^ m [K] with a consistency core K. allow 
a constraint tightness up to 144 — 6 * 8 = 96. 

Example 1. Consider the consistency core K. where the domain size is \D\ = 9 
and assume that all the permutations used in Definition 4 are identity permu- 
tations and l = s = 3. Figure 1 shows the consistency core where the edges 
connected to two vertices in the lower partite are depicted. Using such a consis- 
tency core, a constraint on two variables Xi, Xj in B'ff^K] with t = 45, has a set 
of restrictions 

{(i, j); i = 3ai + 02 and j = 36i + 62 are integers such that 
ai ^ bi and a 2 ^ b 2 }. 

An instance of this CSP model can be viewed as a generalized 3-colorability 
problem. 




Fig. 1. A special type of consistency core with the domain size 9. 



4 Experiments 

In this section, we report results of two sets of preliminary experiments designed 
to (1) study the effect of an increase in the constraint tightness on the typical-case 
complexity; and (2) compare typical-case instance hardness between the classical 
random CSPs, flawless random CSPs, and the generalized flawless random CSPs. 



4.1 Effect of an Increase in the Constraint Tightness 

In [3,4], upper bounds on the constraint tightness have been established for 
random CSPs to have an exponential resolution complexity for any constant 
constraint-to- variable ratio — . Molloy [4] further showed that for the constraint 
tightness above the upper bound, the existence of forcers can be compensated 
by sufficiently low constraint-to-variable ratio so that one can still have typical 
instances with exponential resolution complexity. 
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We have conducted the following experiments to gain further understanding 
on the effect of an increase in the constraint tightness (and hence an increase 
in the likelihood of the existence of a forcer in a constraint) on the typical-case 
hardness of random CSPs. The experiments also help understand the behavior of 
CSP models, such as the flawless CSP model, that only enforce arc-consistency 
(strong 2-consistency). 

In the experiments, we start with a random 3-CNF formula whose clauses 
are treated as constraints. We then incrementally increase the tightness of each 
constraint by adding more clauses defined over the same set of variables. There 
are two reasons why we have based our experiments on random SAT models. 
First, the typical-case complexity of the random SAT model is well-understood 
and therefore, experiments based on the random SAT model will enable us to 
have an objective comparison on the impact of an increase in the constraint 
tightness. Secondly, the complexity of Boolean- valued random CSPs obtained by 
increasing the tightness of the random 3-CNF formula has been characterized 
in great detail. We have a clear picture regarding the appearance of embedded 
easy subproblems in these Boolean- valued random CSPs [3]. 

Let iF(n, to) be a random 3-CNF formula with n variables and to clauses. 
We construct a new random 3-CNF formula T(n,m,a) as follows: 

1. T(n, m, a) contains all the clauses in iF(n, to); 

2. For each clause C in T(n, to), we generate a random clause on the same set 

of variables of C, and add this new clause to T(n, to, a) with probability a. 

In fact, iF(n,?n,a) is the random boolean CSP model with an average con- 
straint tightness 1 + a and has been discussed in [3]. For a > 0, it is easy to see 
that iF(n,?n,a) is always strongly 2-consistent, but is not 3-consistent asymp- 
totically with probability 1. 

Figure 2 shows the median of the number of branches used by the SAT solver 
zClraff on 100 instances of T(n , to, a),a= 0.0, 0.1, 0.2. 

As expected, an increase in the tightness results in a shift of the location 
of the hardness peak toward smaller m/n. More significant is the magnitude of 
the decrease of the hardness as a result of a small increase of the constraint 
tightness. 

From [3], the upper bounds on m/n for T(n,m,a) to have an exponen- 
tial resolution complexity are 23.3 if a = 0.1 and 11.7 if a = 0.2. Since the 
constraint-to- variable rations m/n considered in the experiment are well below 
these bounds above which embedded 2SAT subproblems appear with high prob- 
ability, it seems that the impact of forcers on the instance hardness goes beyond 
simply producing embedded easy subproblems. As forcers can appear at a rela- 
tively low constraint tightness even in CSP models such as the flawless model, 
approaches that are solely based on restricting constraint tightness to gener- 
ate interesting and typically hard instances cannot be as effective as has been 
previously believed. 
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Fig. 2. Effects of an increase in the constraint tightness on the instance hardness for 
T{n, m, 0.0 m, 0.1), and T(n, m, 0.2). n = 250. 

4.2 Comparisons Between Three Random CSP Models 

This set of experiments is designed to investigate the effectiveness of the general- 
ized flawless random CSP model. We generate random instances of the classical 
random models B]^, flawless random model B^’^l], and the generalized ran- 
dom model B^\ n [/C] with the domain size d = 4. For B^’^J/C], we have used a 
2-regular connected bipartite graph as 1C. These instances are then encoded as 
CNF formulas and solved by the SAT solver zClraff [12]. It looks unnatural that 
we have tested random CSP instances by converting them to SAT instances and 
using a SAT solver. This is justified by the following considerations. First, all of 
the existing research on the resolution complexity of random CSPs have been 
carried out by studying the resolution complexity of a SAT encoding of CSPs 
as described in Section 3. We have used the same encoding in the experiments. 
Secondly, it has been shown that as far as the complexity of solving unsatisfiable 
CSP instances is concerned, many of the existing CSP algorithms can be effi- 
ciently simulated by the resolution system of the corresponding SAT encodings 
of the CSPs [13]. 

The experiments show that the threshold of the solution probability of the 
generalized random CSP model Bj^J/C] is much sharper than those of B)];^ 
and B^’^Jl]. More importantly, instances of B.][y r J/C] at the phase transition are 
much harder than those of and B^’^Jl], as shown in Tables 1-3 where the 
median of the number of branches of zClraff for 100 instances of each of the 
three random CSP models is listed at different stages of the solubility phase 
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Table 1. Maximum Median Number of Branches of zChaff on random instances of 
three random CSP models , over all ^ . Domain size d = 4 and 1C is 2-regular. 





■ Number of Branches j 


(n, t) 


B d ’t 

^ n,m 


L-*-J 


BZM 


(100,6) 


230 


224 


399 


(300, 6) 


1830 


1622 


4768 


(500, 6) 


7152 


6480 


45315 


(300, 8) 


843 


1010 


2785 



Table 2. Median Number of Branches of zChaff on random instances of three random 
CSP models at the smallest — where the solution probability is less than 0.1. Domain 
size d = 4 and K, is 2-regular. 





] Number of Branches | 


( n,t ) 


B a ’ * 

^ n,m 


B d ' t [11 

n,m L- 1 -] 


#n’,m [A] 


(100,6) 


116 


154 


241 


(300, 6) 


819 


700 


4768 


(500, 6) 


1398 


1649 


45315 


(300, 8) 


204 


269 


1118 



Table 3. Median Number of Branches of zChaff on random instances of three random 
CSP models at the largest — where the solution probability is greater than 0.9. Domain 
size d = 4 and K, is 2-regular. 





' Number of Branches 


( n,t ) 


B d ’ * 

^ n,m 


B d ^ [ii 
^ n,m L- 1 -] 


#n,m [A] 


(100,6) 


211 


212 


199 


(300, 6) 


1327 


1595 


2809 


(500, 6) 


7152 


6450 


8787 


(300, 8) 


843(0.67) 


709 


2785 



transition: Table 1 is for the constraint density — where the maximum median 
of the number of branches is observed; Table 2 is for the constraint density — 
where the solubility probability is less than 0.1; and Table 3 is for the constraint 
density ^ where the solubility probability is greater than 0.9. 

It can be seen that while the classical random CSP model and flawless ma- 
trix CSP model have little difference, the proposed strong flawless random CSP 
model [/C] with K, being a connected 2-regular bipartite graph is signifi- 
cantly harder in all of the cases except row 1 in Table 3. It is also interesting to 
notice that the most significant difference in the hardness among the three mod- 
els is at the phase where instances of the random CSP models are almost always 
unsatisfiable. A plausible explanation for this phenomenon is that consistency 
is a property that may also help improve the efficiency of search algorithms in 
solving satisfiable instances. 
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5 Conclusions 

In this paper, we have shown that consistency, a notion that has been introduced 
in an effort to improve the efficiency of CSP algorithms, also plays an impor- 
tant role in the design of random CSP models that have interesting threshold 
behavior and guaranteed exponential complexity at phase transitions, while at 
the same time allow a much higher constraint tightness. We have also proposed 
a scheme to generate random consistent random CSPs by generalizing the idea 
of flawless random CSPs. Initial experiments show that the proposed model is 
indeed significantly harder than existing random CSP models. 
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A Appendix 

In this section, we present more concepts related to the resolution complexity 
results stated in this paper and outline the proof of Theorems 1, 2, and 3. 
Detailed proofs can be found in [14] . 

A.l Theorem 1 

In this subsection, we outline the proof of Theorem 1. First let us formalize some 
definitions such as a forcer, a forbidding cycle, and an r-flower. Following [10], we 
call an expression of the form x : a a literal for a CSP. A literal x : a evaluates 
to TRUE at an assignment if the variable x is assigned the value a. A nogood for 
a CSP, denoted by r](xi : ai,- • • ,xi : af), is a disjunction of the negations of the 
literals ipa^KKI.A nogood is equivalent to a restriction {a\, • • • , a/} on 
the set of variables {x\, • • • ,xi}, and the restrictions of a constraint correspond 
to a set of nogoods defined over the same set of variables. 

Definition 5 (Forcers [4]). A constraint Cf with var (Cf) = { Xi,X 2 } is called 
an (a, / 3 ) -forcer if its restriction set corresponds to the set of nogoods 

NG (C f ) = {r)(xi : a, x 2 : 7); 7 ^ /?}• 

We say that a constraint C contains an (a, (3) -forcer Cf defined on the same set 
of variables as C if NG(Cf) C NG(C). 

Definition 6 (Forbidding cycles and r- flowers [4]). An a-forbidding cycle 
for a variable xq is a set of constraints Ci(xo, Xi), (72(a;i, X 2 ), ■ ■ ■ , C r _ 1(27-2, 
x r -i), and C r (x r -i, xq) such that C\(xo,x\) is an (a, oq )-forcer, C r (x r -i,Xo) 
is an (a r -i,a r )~ forcer (a r ^ a), and Ci(xi-\,Xi) is an (aj_i, a *)- forcer (2 < 
i < r — 1). We call Xq the center variable of the cycle. 

An r-flower R = {C i,--- ,C<j} consists of d (the domain size) forbidding 
cycles each of which has length r such that (1) Ci, 1 < i < d, have the same 
center variable x; (2) each Ci is an a*— forbidding cycle of the common center 
variable x; and (3) these forbidding cycles do not share any other variables. 

The following facts are straightforward to establish: 

1. An r-flower consists of s = d(r — 1) + 1 = dr — d + 1 variables and dr 
constraints; 

2. The total number of r-flowers is (")s!(g? — 1 

3. A constraint in the flawless CSP model is an (a, /3)-forcer only if the pair 
(a, /?) is one of the pre-selected tuples in the flawless constraint matrix. 

The probability for a constraint to contain a forcer and the probability for the 
flawless random CSP to contain an r-flower are given in the following lemma. 

Lemma 1. Consider the flawless random CSP #„„[ 1] and define f e = 

(d 2 -d-d+ 1\ 

\ t-d+1 ) 

( d \- d ) ' 
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1. The probability that a given constraint C(x 1 , 22 ) contains an (a, (3)-forcer is 



2. Let R be an r- flower and let c = m/ n, 

P{R appears in B d ’* m [ 1]} = 0(l){2cf e ) dr ^-^. (2) 

Proof. Equation (1) follows from the following two observations: (A)^ is the 
probability for ( a , / 3 ) to be one of the pre-selected tuples in the flawless conflict 
matrix; and (B) f e is the probability for the rest of the tuples, (a, 7 ), 7 ^ (3, to 
be in the set of t restrictions selected uniformly from d 2 — d tuples. 

To calculate the probability that a given r-flower R appears in B d,t n Jl], notice 
that the probability of selecting all the constraint edges in R is 

/ N— dr\ 

\cn—dr) 

Tv 

Vcn/ 



where N = Q). And for each fixed choice of dr constraint edges in the r-flower, 
the probability for these constraints to contain the r-flower is ( bfe) ■ □ 

The outline of the proof of Theorem 1 is given below. 

Proof (Proof of Theorem 1). Let c* = jjr- We will show that if c = ^ > c*, 
then 

limP-jT’^Jl]} contains an r-flower} = 1. (3) 

n ’ 

Let Ir be the indicator function of the event that the r-flower R appears in 
£>}}}„ [1] and X = Ir where the sum is over all the possible r-flowers. Then, 
R 

contains an r-flower if and only if A >0. 

By Lemma 1 and the fact that s = dr — d + 1, we have 



E[X] = ^ = 0(l)72 1 ~ d (2cf e ) dr . 

R 

Therefore, if c > c* and r = Alogn with A sufficiently large, E[X] — » 00 . An 
application of the second-moment method will prove the theorem. Details can 
be found in [14]. 

Remark 1. The relatively loose upper bound c* = jjr i n the above proof may 
be improved by a factor of d by further distinguishing among the r-flowers that 
share forcing values at a different number of shared variables. 



A. 2 Theorems 2 and 3 

In this subsection, we give a brief introduction to the concepts and ideas required 
in the proof of Theorems 2 and 3. 
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Given a CNF formula T , we use Res(lF) to denote the resolution complexity 
of T , i.e., the length of the shortest resolution refutation of T . The width of 
deriving a clause A from IF, denoted by w{T b A), is defined to be the minimum 
over all the resolution refutations of the maximum clause size in a resolution 
refutation. Ben-Sasson and Wigderson [15] established a relationship between 
Res(lF) and w(F h 0): 

Res(^ = e fl ( <"^V^ >. 

This relationship indicates that to give an exponential lower bound on the res- 
olution complexity, it is sufficient to show that every resolution refutation of T 
contains a clause whose size is linear in n, the number of variables. 

Let T be an instance of the CSP and CNF(T) be the CNF encoding of 
T. Mitchell [10] provided a framework within which one can investigate the 
resolution complexity of T, i.e., the resolution complexity of the CNF formula 
CNF(T) that encodes T, by working directly on the structural properties of 
T. A sub-instance J of T is a CSP instance such that var (J) C var(T) and 
J contains all the constraints of T whose scope variables are in var(J'). The 
following crucial concepts make it possible to work directly on the structural 
properties of the CSP instance when investigating the resolution complexity of 
the encoding CNF formula. 

Definition 7 (Implies. Defined [10]). For any assignment a to the variables 
in the CSP instance T, we write a for the truth assignment to the variables in 
CNF(T) that assigns a variable x : a the value TRUE if and only if a(x) = a. 

Let C be a clause over the variables of CNF (T). We say that a sub-instance 
J of T implies C, denoted as J \= C , if and only if for each assignment a 
satisfying J , a satisfies C. 

Definition 8 (Clause Complexity [10]). Let T be a CSP instance. For each 
clause C defined over the Boolean variables in var(CNF(T)), define 

n(C, T) = min{|var(j7)|; J is a sub-instance and implies C}. 

The following two concepts slightly generalize those used in [10, 4] and enable 
us to have a uniform treatment when establishing resolution complexity lower 
bounds. 



Definition 9 (Boundary). The boundary B{J) of a sub-instance J is the set 
of CSP variables such that for each x € B{J), if J minimally implies a clause 
C defined on some Boolean variables in var(CNF(f7)), then C contains at least 
one of the Boolean variables, x : a, a € D, that encode the CSP variable x. 

Definition 10 (Sub-critical Expansion [10]). LetT be a CSP instance. The 
siLb-critical expansion of T is defined as 



e(T) = max min 

O<s<m( 0,T) s/2<|var(y)|<s ' 



(4) 



where the minimum is taken over all the sub-instances of T such that s/2 < 
| var (J - ) | < s. 
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The following theorem relates the resolution complexity of the CNF encoding 
of a CSP instance to the sub-critical expansion of the CSP instance. 

Theorem 6. [10] For any CSP instance T, we have 

to (CNF (T) h 0) > e(T) (5) 

To establish an asymptotically exponential lower bound on Res(C) for a ran- 
dom CSP C, it is enough to show that there is a constant /?* > 0 such that 

limP{e(C) > /3 *n} = 1. (6) 

n 

For any a > 0, let A m {oi) be the event {/z(0,C) > cm} and A s {a,(3*) be the 
event 

min B{J) > P*n 

< | var(j7’) | < cm 

Notice that 



P{e(C) >0*n}>P{A m (a)nA s (a,(3*)} 

> 1 - P{AM} - P{A s (a,P*)} . (7) 

We only need to find appropriate a* and /3* such that 

limP{„4 m (a*)} = 1 (8) 

n 

and 

\imP{A s (a*,P*)} = l. (9) 

n 

Event A rn (a* ) is about the size of minimally unsatisfiable sub- instances. For 
the event A s (a*,/3*), a common practice is to identify a special subclass of 
boundaries and show that this subclass is large. For different random CSP models 
and under different assumptions on the model parameters, there are different 
ways to achieve this. Details about the proofs of Theorems 2 and 3 can be found 
in [14]. 
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Abstract. Much progress has been made in terms of boosting the ef- 
fectiveness of backtrack style search methods. In addition, during the 
last decade, a much better understanding of problem hardness, typi- 
cal case complexity, and backtrack search behavior has been obtained. 
One example of a recent insight into backtrack search concerns so-called 
heavy-tailed behavior in randomized versions of backtrack search. Such 
heavy-tails explain the large variations in runtime often observed in prac- 
tice. However, heavy-tailed behavior does certainly not occur on all in- 
stances. This has led to a need for a more precise characterization of when 
heavy-tailedness does and when it does not occur in backtrack search. 
In this paper, we provide such a characterization. We identify different 
statistical regimes of the tail of the runtime distributions of random- 
ized backtrack search methods and show how they are correlated with 
the “sophistication” of the search procedure combined with the inherent 
hardness of the instances. We also show that the runtime distribution 
regime is highly correlated with the distribution of the depth of inconsis- 
tent subtrees discovered during the search. In particular, we show that 
an exponential distribution of the depth of inconsistent subtrees com- 
bined with a search space that grows exponentially with the depth of 
the inconsistent subtrees implies heavy-tailed behavior. 



1 Introduction 

Significant advances have been made in recent years in the design of search en- 
gines for constraint satisfaction problems (CSP), including Boolean satisfiability 
problems (SAT). For complete solvers, the basic underlying solution strategy is 
backtrack search enhanced by a series of increasingly sophisticated techniques, 
such as non-chronological backtracking, fast pruning and propagation methods, 
nogood (or clause) learning, and more recently randomization and restarts. For 
example, in areas such as planning and finite model-checking, we are now able to 
solve large CSP’s with up to a million variables and several million constraints. 

* This work was supported in part by the Intelligent Information Systems Institute, 
Cornell University (AFOSR grant F49620-01-1-0076). 
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The study of problem structure of combinatorial search problems has also 
provided tremendous insights in our understanding of the interplay between 
structure, search algorithms, and more generally, typical case complexity. For 
example, the work on phase transition phenomena in combinatorial search has 
led to a better characterization of search cost, beyond the worst-case notion of 
NP-completeness. While the notion of NP-completeness captures the computa- 
tional cost of the very hardest possible instances of a given problem, in practice, 
one may not encounter many instances that are quite that hard. In general, CSP 
problems exhibit an “easy-hard-easy” pattern of search cost, depending on the 
constrainedness of the problem [1], The computational hardest instances appear 
to lie at the phase transition region, the area in which instances change from 
being almost all solvable to being almost all unsol vable. The discover of “excep- 
tionally hard instances” reveals an interesting phenomenon : such instances seem 
to defy the “easy-hard-easy” pattern, they occur in the under-constrained area, 
but they seem to be considerably harder than other similar instances and even 
harder than instances from the critically constrained area. This phenomenon was 
first identified by Hogg and Williams in graph coloring and by Gent and Walsh 
in satisfiability problems [2,3]. However, it was shown later that such instances 
are not inherently difficult; for example, by renaming the variables such instances 
can often be easily solved [4,5]. Therefore, the “hardness” of exceptionally hard 
instances does not reside purely in the instances, but rather in the combination 
of the instance with the details of the search method. Smith and Grant provide a 
detailed analysis of the occurrence of exceptionally hard instances for backtrack 
search, by considering a deterministic backtrack search procedure on ensembles 
of instances with the same parameter setting (see e.g., [6]). 

Recently, researchers have noted that for a proper understanding of search 
behavior one has to study full runtime distributions [3,7-10]. In our work we 
have focused on the study of randomized backtrack search algorithms [8]. By 
studying the runtime distribution produced by a randomized algorithm on a 
single instance, we can analyze the variance caused solely by the algorithm, and 
therefore separate the algorithmic variance from the variance between different 
instances drawn from an underlying distributions. We have shown previously 
that the runtime distributions of randomized backtrack search procedures can 
exhibit extremely large variance, even when solving the same instance over and 
over again. This work on the study of the runtime distributions of randomized 
backtrack search algorithms further clarified that the source of extreme variance 
observed in exceptional hard instances was not due to the inherent hardness of 
the instances: A randomized version of a search procedure on such instances in 
general solves the instance easily, even though it has a non-negligible probabil- 
ity of taking very long runs to solve the instance, considerably longer than all 
the other runs combined. Such extreme fluctuations in the runtime of backtrack 
search algorithms are nicely captured by so-called heavy-tailed distributions, 
distributions that are characterized by extremely long tails with some infinite 
moments [3,8]. The decay of the tails of heavy-tailed distributions follows a 
power law - much slower than the decay of standard distributions, such as the 
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normal or the exponential distribution, that have tails that decay exponentially. 
Further insights into the empirical evidence of heavy-tailed phenomena of ran- 
domized backtrack search methods were provided by abstract models of back- 
track search that show that, under certain conditions, such procedures provably 
exhibit heavy-tailed behavior [11,12]. 



Main Results. So far, evidence for heavy-tailed behavior of randomized back- 
track search procedures on concrete instance models has been largely empirical. 
Moreover, it is clear that not all problem instances exhibit heavy-tailed behavior. 
The goal of this work is to provide a better characterization of when heavy-tailed 
behavior occurs, and when it does not, when using randomized backtrack search 
methods. We study the empirical runtime distributions of randomized backtrack 
search procedures across different constrainedness regions of random binary con- 
straint satisfaction models 1 . In order to obtain the most accurate empirical run- 
time distributions, all our runs are performed without censorship ( i.e ., we run 
our algorithms without a cutoff) over the largest possible size. Our study reveals 
dramatically different statistical regimes for randomized backtrack search algo- 
rithms across the different constrainedness regions of the CSP models. Figure 1 
provides a preview of our results. The figure plots the runtime distributions (the 
survival function, i.e., the probability of a run taking more than x backtracks) 
of a basic randomized backtrack search algorithm (no look-ahead and no look- 
back), using random variable and value selection, for different constrainedness 
regions of one of our CSP models (model E; instances with 17 variables and 
domain size 8). We observe two regions with dramatically different statistical 
regimes of the runtime distribution. 

In the first regime (the bottom two curves in Fig. 1, p < 0.07), we see heavy- 
tailed behavior. This means that the runtime distributions decay slowly. In the 
log-log plot, we see linear behavior over several orders of magnitude. When we 
increase the constrainedness of our model (higher p), we encounter a different 
statistical regime in the runtime distributions, where the heavy-tails disappear. 
In this region, the instances become inherently hard for the backtrack search 
algorithm, all the runs become homogeneously long, and therefore the variance 
of the backtrack search algorithm decreases and the tails of its survival function 
decay rapidly (see top two curves in Fig. 1, with p = 0.19 and p = 0.24; tails 
decay exponentially). 

A common intuitive understanding of the extreme variability of backtrack 
search is that on certain runs the search procedure may hit a very large incon- 
sistent subtree that needs to be fully explored, causing “thrashing” behavior. 

1 Hogg and Willimans (94) provided the first report of heavy-tailed behavior in the 
context of backtrack search. They considered a deterministic backtrack search proce- 
dure on different instances drawn from a given distribution. Our work is of different 
nature as we study heavy-tailed behavior of the runtime distribution of a given ran- 
domized backtrack search method on a particular problem instance, thereby isolating 
the variance in runtime due to different runs of the algorithm. 
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model E <17,8,p> BT Random 




Number of backtracks 

Fig. 1. Heavy-tailed (linear behavior) and non- heavy-tailed regime in the runtime of 
instances of model E (17, 8, p). CDF stands for Cummulative Density Function. 




Fig. 2. Inconsistent subtrees in backtrack search. 



To confirm this intuition and in order to get further insights into the statisti- 
cal behavior of our backtrack search method, we study the inconsistent sub-trees 
discovered by the algorithm during the search (see Figure 2). 

The distribution of the depth of inconsistent trees is quite revealing: when 
the distribution of the depth of the inconsistent trees decreases exponentially 
(see Figure 3, bottom panel, p = 0.07) the runtime distribution of the backtrack 
search method has a power law decay (see Figure 3, top panel, p = 0.07). In other 
words, when the backtrack search heuristic has a good probability of finding rela- 
tively shallow inconsistent subtrees, and this probability decreases exponentially 
as the depth of the inconsistent subtrees increases, heavy-tailed behavior occurs. 
Contrast this behavior with the case in which the survival function of the run- 
time distribution of the backtrack search method is not heavy-tailed (see Figure 



36 



Carla P. Gomes et al. 



model E <17,8,p> BT Random 




Model E <17,8,P> BT Random 




1ST Depth (N) 

Fig. 3. Example of a heavy-tailed instance ( p = 0.07) and a non-heavy-tailed instance 
(p = 0.24): (top) Survival function of runtime distribution, (bottom) probability density 
function of depth of inconsistent subtrees encountered during search. The subtree depth 
for p = 0.07 instance is exponentially distributed. 



3, top panel, p = 0.24). In this case, the distribution of the depth of inconsistent 
trees no longer decreases exponentially (see Figure 3, bottom panel, p = 0.24). 

In essence, these results show that the distribution of inconsistent subprob- 
lems encountered during backtrack search is highly correlated with the tail be- 
havior of the runtime distribution. We provide a formal analysis that links the 
exponential search tree depth distribution with heavy-tailed runtime profiles. As 
we will see below, the predictions of our model closely match our empirical data. 



2 Definitions, Problem Instances, and Search Methods 

Constraint Networks. A finite binary constraint network V = (A,2?,C) is 
defined as a set of n variables X = {xi , . . . , x n }, a set of domains 
V = {D(xi ), . . . , D(x n )}, where D(xi) is the finite set of possible values for 
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variable x,, and a set C of binary constraints between pairs of variables. A 
constraint Cij on the ordered set of variables ( Xi,Xj ) is a subset of the Cartesian 
product D(xi) x D{xj) that specifies the allowed combinations of values for the 
variables Xi and Xj. A solution of a constraint network is an instantiation of the 
variables such that all the constraints are satisfied. The constraint satisfaction 
problem (CSP) involves finding a solution for the constraint network or proving 
that none exists. We used a direct CSP encoding and also a Boolean satisfiability 
encoding (SAT) [13]. 

Random Problems. The CSP research community has always made a great 
use of randomly generated constraint satisfaction problems for comparing differ- 
ent search techniques and studying their behavior. Several models for generating 
these random problems have been proposed over the years. The oldest one, which 
was the most commonly used until the middle 90’s, is model A. A network gen- 
erated by this model is characterized by four parameters {N,D,pl,p2), where 
N is the number of variables, D the size of the domains, pi the probability of 
having a constraint between two variables, and p 2 , the probability that a pair 
of values is forbidden in a constraint. Notice that the variance in the type of 
problems generated with the same four parameters can be large, since the ac- 
tual number of constraints for two problems with the same parameters can vary 
from one problem to another, and the actual number of forbidden tuples for 
two constraints inside the same problem can also be different. Model B does not 
have this variance. In model B , the four parameters are again N,D,pl, and p2, 
where N is the number of variables, and D the size of the domains. But now, 
Pi is the proportion of binary constraints that are in the network (ie., there are 
exactly c = [pi-N- (N — 1) /2J constraints) , and p 2 is the proportion of forbidden 
tuples in a constraint (i.e., there are exactly t = [p 2 ■ D 2 J forbidden tuples in 
each constraint). Problem classes in this model are denoted by ( N,D,c,t ). In 
[14] it was shown that model B (and model A as well) can be “flawed” when we 
increase N. Indeed, when N goes to infinity, we will almost surely have a flawed 
variable (that is, one variable which has all its values inconsistent with one of the 
constraints involving it). Model E was proposed to overcome this weakness. It is 
a three parameter model, ( N , D,p), where N and D are the same as in the other 
models, and [ p ■ D 2 ■ N ■ ( N — 1)/2J forbidden pairs of values are selected with 
repetition out of the D 2 ■ N • ( N — 1) /2 possible pairs. There is a way of tackling 
the problem of flawed variables in model B. In [15] it is shown that by putting 
certain constraints on the relative values of N, D , pi, and p 2 , one can guarantee 
that model B is sound and scalable, for a range of values of the parameters. In 
our work, we only considered instances of model B that fall within such a range 
of values. 

Search Trees. A search tree is composed of nodes and arcs. A node u represents 
an ordered partial instantiation I(u) = (xq = v^, . . . ,Xj fc = Vi k ). A search tree 
is rooted at the particular node Uq with I(uq) = 0 . There is an arc from a node 
u to a node u c if I(u c ) = ( I{u ), x = v), x and v being a variable and one of its 
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values. The node u c is called a child of u and u a parent of u c . Every node u 
in a tree T defines a subtree T u that consists of all the nodes and arcs below u 
in T. The depth of a subtree T u is the length of the longest path from u to any 
other node in T u . An inconsistent subtree (1ST) is a maximal subtree that does 
not contain any node u such that /(it) is a solution. (See Fig. 2.) The maximum 
depth of an inconsistent subtree is referred to the “inconsistent subtree depth” 
(ISTD). We denote by T(A, P) the search tree of a backtrack search algorithm 
A solving a particular problem P, which contains a node for each instantiation 
visited by A until a solution is reached or inconsistency of P is proved. Once 
assigned a partial instantiation I (it) = (xi x = Vi i: ... , Xi k = Vi k ) for node it, 
the algorithm will search for a partial instantiation of some of its children. In 
the case that there exists no instantiation which does not violate the constraints, 
algorithm A will take another value for variable Xi k , and start again checking the 
children of this new node. In this situation, it is said that a backtrack happens. 
We use the number of wrong decisions or backtracks to measure the search cost 
of a given algorithm [16] 2 . 

Algorithms. In the following, we will use different search procedures, that differ 
in the amount of propagation they perform, and in the order in which they gen- 
erate instantiations. We used three levels of propagation: no propagation (back- 
tracking, BT), removal of values directly inconsistent with the last instantiation 
performed (forward-checking, FC), and arc consistency propagation (maintaining 
arc consistency, MAC). We used three different heuristics for ordering variables: 
random selection of the next variable to instantiate (random), variables pre- 
ordered by decreasing degree in the constraint graph (deg), and selection of the 
variable with smallest domain first, ties broken by decreasing degree (dom+deg) 
and always random value selection. For the SAT encodings we used the Davis- 
Putnam-Logemann-Loveland procedure. More specifically we used a simplified 
version of Satz [17], without its standard heuristic, and with static variable or- 
dering, injecting some randomness in the value selection heuristics. 

Heavy- Tailed or Pareto-Like Distributions. As we discussed earlier, the 
runtime distributions of backtrack search methods are often characterized by 
very long tails or heavy-tails (HT). These are distributions that have so-called 
Pareto like decay of the tails. For a general Pareto distribution F( x), the prob- 
ability that a random variable is larger than a given value x, i.e., its survival 
function, is: 

1 - F(x) = P[X > x] ~ Cx~ a , z > 0, 

where a > 0 and C > 0 are constants. I.e., we have power-law decay in the tail of 
the distribution. These distributions have infinite variance when 1 < a < 2 and 

2 In the rest of the paper sometimes we refer to the search cost as runtime. Even 
though there are some discrepancies between runtime and the search cost measured 
in number of wrong decisions or backtracks, such differences are not significant in 
terms of the tail regime of the distributions. 
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infinite mean and variance when 0 < a < 1. The log-log plot of the tail of the 
survival function (1 — F(x)) of a Pareto-like distribution shows linear behavior 
with slope determined by a. 

3 Empirical Results 

In the previous section, we defined our models and algorithms, as well as the 
concepts that are central in our study: the runtime distributions of our backtrack 
search methods and the associated distributions of the depth of the inconsistent 
subtrees found by the backtrack method. As we discussed in the introduction, 
our key findings are: (1) we observe different regimes in the behavior of these 
distributions as we move along different instance constrainedness regions; (2) 
when the depth of the inconsistent subtrees encountered during the search by the 
backtrack search method follows an exponential distribution, the corresponding 
backtrack search method search exhibits heavy-tailed behavior. In this section, 
we provide the empirical data upon which these findings are based. 

We present results for the survival functions of the search cost (number of 
wrong decisions or number of backtracks) of our backtrack search algorithms. 
All the plots were computed with at least 10,000 independent executions of a 
randomized backtrack search procedure on a given (uniquely generated) problem 
satisfiable instance. For each parameter setting we considered over 20 instances. 
In order to obtain more accurate empirical runtime distributions, all our runs 
were performed without censorship, i. e., we run our algorithms without any 
cutoff 3 . We also instrumented the code to obtain the information for the corre- 
sponding inconsistency sub-tree depth (ISTD) distributions. 

Figure 4 (top) provides a detailed view of the heavy-tailed and non-heavy- 
tailed regions, as well as the progression from one region to the other. The figure 
displays the survival function (log-log scale) for running (pure) backtrack search 
with random variable and value selection on instances of Model E with 17 vari- 
ables and a domain size of 8 for values of p (the constrainedness of the instances) 
ranging from 0.05 < p < 0.24. We clearly identify the heavy-tailed region in 
which the log-log plot of the survival functions exhibits linear behavior, while in 
the non-heavy-tailed region the drop of the survival function is much faster than 
linear. The transition between regimes occurs around a constrainedness level of 
p = 0.09. 

Figure 4 (bottom) depicts the probability density function of the correspond- 
ing inconsistent sub-tree depth (ISTD) distributions. The figure shows that while 
the ISTD distributions that correspond to the heavy-tailed region have an expo- 
nential behavior (below we show a good regression fit to the exponential distri- 
bution in this region), the ISTD distributions that correspond to the non-heavy- 
tailed region are quite different from the exponential distribution. 

3 For our data analysis, we needed purely uncensored data. We could therefore only 
consider relatively small problem instances. The results appear to generalize to larger 
instances. 
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model E <17,8,p> BT Random 




Number of wrong decisions 



Model E <17,8,P> BT Random 




1ST Depth (N) 



Fig. 4. The progression from heavy-tailed regime to non-heavy-tailed regime: (top) 
survival function of runtime distribution; (bottom) probability density function of the 
corresponding inconsistent sub-tree depth (ISTD) distribution. 



For all the backtrack search variants that we considered on instances of model 
E, including the DPLL procedure, we observed a pattern similar to that of 
figure 4. (See bottom panel of figure 6 for DPLL data.) 

We also observed a similar behavior - a transition from heavy-tailed re- 
gion to non-heavy-tailed region with increased constrainedness - for instances 
of Model B, for different problem sizes and different search strategies. Figure 5 
(top) shows the survival functions of runtime distributions of instances of model 
B (20,8, 60, t), for different levels of constrainedness, solved with BT-random. 
Fig. 5 (bottom) shows the survival functions of runtime distributions of instances 
of model B (50, 10, 167, f), for different levels of constrainedness, solved with 
MAC-random, a considerably more sophisticated search procedure. The top panel 
of Fig. 6 gives the DPLL data. Again, the two different statistical regimes of the 
runtime distributions are quite clear. 



Statistical Regimes Across Constrainedness Regions 



41 



model B <20,8,60,t> BT Random 




Number of bactracks 



model B <50, 10,1 67, t> MAC Random 




Fig. 5. Heavy-tailed and non-heavy-tailed regimes for instances of model B: (top) 
(20, 8, 60, t), using BT-random, (bottom) (50, 10, 167, t), using MAC-random. 

To summarize our findings: 

— For both models (B and E), for CSP and SAT encodings, for each backtrack 
search strategies, we clearly observe two different statistical regimes - a 
heavy-tailed and a non-heavy-tailed regime. 

— As constrainedness increases ( p increases), we move from the heavy-tailed 
region to the non-heavy-tailed region. 

— The transition point from heavy-tailed to non-heavy-tailed regime is depen- 
dent on the particular search procedure adopted. As a general observation, 
we note that as the efficiency (and, in general, propagation strength) of the 
search method increases, the extension of the heavy-tailed region increases 
and therefore the heavy-tailed threshold gets closer to the phase transition. 

— Exponentially distributed inconsistent sub-tree depth (ISTD) combined with 
exponential growth of the search space as the tree depth increases implies 
heavy-tailed runtime distributions. We observe that as the ISTD distribu- 
tions move away from the exponential distribution, the runtime distributions 
become non-heavy-tailed. 
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model B <20,8,60,t> DP random 




model E <25,10,p> DP random 




Fig. 6. Heavy-tailed and non-heavy-tailed regimes for instances of (top) model B 
(20, 8, 60, t), using DP-random (DPLL procedure with static variable ordering and random 
value selection) and (bottom) model E (25, 10, p) using DP-random. 

These results suggest that the existence of heavy-tailed behavior in the cost 
distributions depends on the efficiency of the search procedure as well as on the 
level of constrainedness of the problem. Increasing the algorithm efficiency tends 
to shift the heavy-tail threshold closer to the phase transition. 

For both models, B and E, and for the different search strategies, we clearly 
observed that when the ISTD follows an exponential distribution, the corre- 
sponding distribution is heavy-tailed. We refer to the forthcoming long version 
of this paper for the probability density functions of the corresponding incon- 
sistent sub-tree depth distributions (ISTD) of model B, and for data on the 
regression fits (see also below) for all curves. 

4 Validation 

Let X be the search cost of a given backtrack search procedure, Pi S td{N] be 
the probability of finding an inconsistent subtree of depth N during search, and 
P[X > a^iV] the probability of having a inconsistent search tree of size larger 
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than x, given a tree of depth N. Assuming that the inconsistent search tree 
depth follows an exponential distribution in the tail and the search cost inside 
an inconsistent tree grows exponentially, then the cost distribution of a search 
method is lower bounded by a Pareto distribution. More formally 4 : 

Theoretical Model. 



Assumptions: 

— Pistd[N] is exponentially distributed in the tail, i.e., 

Pistd[N] = B 1 e- B2N ,N>n 0 (1) 



where B\, B 2 , and no are constants. 

— P[X > x|AT] is modeled as a complementary Heavyside function, 
1 — H(x — k N ), where k is a constant and 



H(x — a) 



0, x < a 

1, x > a 



Then, P[X > x] is Pareto-like distributed 



P[X > x\~ (3x “ 



for x > k n ° , where a and (3 are constants. 



Derivation of result: 



Note that P[ X > x] is lower bounded as follows 

pOO 

P[X >x}> Pistd[N) P[X > x\N]dN (2) 

Jn=o 

This is a lower bound since we consider only one inconsistent tree contributing 
to the search cost, when in general there are more inconsistent trees. Given the 
assumptions above, Eq. (2) results 



POO POO 

P[x >x\> P is td[N } (1 - H(x - k N ))dN = / P istd [N]dN 

Jn=o 

Since x > k n ° , we can use Eq.(l) for Pi S td[N ], so Eq.(3) results in: 

P[X > x\> [ B\e~ B2N dN = — = (3x~ a \ a = (3 = 

J N= lnx B 2 In k 



Bi 

B 2 



In order to validate this model empirically we consider an instance of model 
B (20,8,60,7), running BT-random, the same instance plotted in Fig. 5(a), for 

4 See forthcoming extended version of the paper for further details. A similar analysis 
goes through using the geometric distribution, the discrete analogue of the exponen- 
tial. 
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Fig. 7. Regressions for the estimation of Bl=0.015, B2=0.408 (top plot; quality of 
fit R 2 = 0.88), and k = 4.832 (middle plot; R 2 = 0.98) and comparison of lower 
bound based on the theoretical model with empirical data (bottom plot). We have 
a = i? 2 /ln(fc) = 0.26 from our model; a = 0.27 directly from runtime data. Model B 
(20, 8, 60, t), using BT-random. 



which heavy-tailed behavior was observed (t = 7). The plots in Fig. 7 provide the 
regression data and fitted curves for the parameters B i, B 2 , and k, using no = 1. 
The good quality of the linear regression fit suggests that our assumptions are 
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very reasonable. Based on the estimated values for k, B\, and i? 2 , we then 
compare the lower bound predicted using the formal analysis presented above 
with the empirical data. As we can see from Fig. 7, the theoretical model provides 
a good (tight) lower bound for the empirical data. 

5 Conclusions and Future Work 

We have studied the runtime distributions of complete backtrack search meth- 
ods on instances of well-known random CSP binary models. Our results reveal 
different regimes in the runtime distributions of the backtrack search procedures 
and corresponding distributions of the depth of the inconsistent sub-trees. We 
see a changeover from heavy-tailed behavior to non-heavy-t ailed behavior when 
we increase the constrainedness of the problem instances. The exact point of 
changeover depends on the sophistication of the search procedure, with more 
sophisticated solvers exhibiting a wider range of heavy-tailed behavior. In the 
non-heavy-tailed region, the instances become harder and harder for the back- 
track search algorithm, and the runs become nearly homogeneously long. We 
have also shown that that there is a clear correlation between the the distribu- 
tions of the depth of the inconsistent sub-trees encountered by the backtrack 
search method and the heavy-tailedness of the runtime distributions, with expo- 
nentially distributed sub-tree depths leading to heavy-tailed search. To further 
validate our findings, we compared our theoretical model, which models expo- 
nentially distributed subtrees in the search space, with our empirical data: the 
theoretical model provides a good (tight) lower bound for the empirical data. 
Our findings about the distribution of inconsistent subtrees in backtrack search 
give, in effect, information about the inconsistent subproblems that are created 
during the search. We believe that these results can be exploited in the design 
of more efficient restart strategies and backtrack solvers. 
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Abstract. One of the most appealing features of constraint program- 
ming is its rich constraint language for expressing combinatorial opti- 
mization problems. This paper demonstrates that traditional combina- 
tors from constraint programming have natural counterparts for local 
search, although their underlying computational model is radically dif- 
ferent. In particular, the paper shows that constraint combinators, such 
as logical and cardinality operators, reification, and first-class expressions 
can all be viewed as differentiable objects. These combinators naturally 
support elegant and efficient modelings, generic search procedures, and 
partial constraint satisfaction techniques for local search. Experimental 
results on a variety of applications demonstrate the expressiveness and 
the practicability of the combinators. 



1 Introduction 

Historically, most research on modeling and programming tools for combinatorial 
optimization has focused on systematic search, which is at the core of branch 
& bound and constraint satisfaction algorithms. However, in recent years, in- 
creased attention has been devoted to the design and implementation of pro- 
gramming tools for local search (e.g., [2,20,16,7,8,12,18]). This is motivated 
by the orthogonal strengths of the paradigms, the difficulty of obtaining efficient 
implementations, and the lack of compositionality and reuse in local search. 

Comet [9, 17] is a novel, object-oriented, programming language specifically 
designed to simplify the implementation of local search algorithms. Comet sup- 
ports a constraint-based architecture for local search organized around two main 
components: a declarative component which models the application in terms of 
constraints and functions, and a search component which specifies the search 
heuristic and meta-heuristic. Constraints and objective functions are natural 
vehicles to express combinatorial optimization problems and often capture com- 
binatorial substructures arising in many practical applications. But constraints 
and objective functions have a fundamentally different computational model in 
Comet as they do not prune the search space. Rather they are differentiable ob- 
jects that maintain a number of properties incrementally and provide algorithms 
to evaluate the effect of various operations on these properties. The search com- 
ponent then uses these functionalities to guide the local search using selectors 
and other high-level control structures [17]. The architecture enables local search 
algorithms to be high-level, compositional, and modular. 
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However, constraint programming languages and libraries also offer rich lan- 
guages for combining constraints, including logical and cardinality operators, 
reification, and expressions over variables. These “combinators” are fundamental 
in practice, not only because they simplify problem modeling, but also because 
they lay the foundations for expressing various kinds of ad-hoc constraints that 
typically arise in complex applications, as well as generic search procedures. 

This paper shows that traditional constraint programming combinators bring 
similar benefits for local search, although their underlying computational model 
differ significantly. In particular, the paper shows that arithmetic, logical, and 
cardinality operators, reification, and first-class expressions can all be viewed 
as differentiable objects, providing high-level and efficient abstractions for com- 
posing constraints and objectives in local search. These combinators naturally 
support very high-level local search models, partial constraint satisfaction tech- 
niques, and generic search procedures, which are independent of the application 
at hand and only rely on generic interfaces for constraints and objective func- 
tions. As a consequence, they foster the separation of concerns between mod- 
eling and search components, increase modularity, and favor compositionality 
and reuse. More generally, the paper shows that the rich language of constraint 
programming is conceptually robust and brings similar benefits to constraint 
programming and local search, despite their fundamentally different computa- 
tional models. 

The rest of this paper is organized as follows. Sections 2 and 3 discuss how 
to combine constraints and objective functions in Comet, Section 4 introduces 
first-class expressions, and Section 5 discusses generic search procedures. Appli- 
cations and experimental results are presented together with the abstractions in 
these sections. Section 6 concludes the paper. 

2 Constraints 

Constraints in Comet are differentiable objects implementing the interface (par- 
tially) described in Figure 1. The interface gives access to the constraint variables 
(method getVariables) and to two incremental variables which represent the 
truth value of the constraint (method isTrue) and its violation degree (method 
violations). The violation degree is constraint-dependent and measures how 
much the constraint is violated. This information is often more useful than the 
constraint truth value as far as guiding the search is concerned. For instance, the 
violation degree of an arithmetic constraint l > r is given by max(0,r — l). The 
violation degree of the combinatorial constraint allPresent (R,x) , which holds 
if all values in range R occur in array x, is given by #{ v € R | Si : x[i] = t>}. 
The method getViolations returns the violations which may be attributed to 
a given variable, which is often useful in selecting local moves. The remaining 
methods provide the differentiable API of the constraint. They make it possi- 
ble to evaluate the effect of an assignment, a swap, or multiple assignments on 
the violation degree. The differentiable API is fundamental in obtaining good 
performance on many applications, since the quality of local moves can be evalu- 
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Interface Constraint { 

inc{int}[] getVariablesO ; 

inc{boolean} isTrueO; 

inc{int} violations 0; 

int getViolations(inc{int} x) ; 

int getAssignDelta(inc{int} x,int v) ; 

int getSwapDelta(inc{int} xl,inc{int} x2) ; 

int getAssignDelta(inc{int} [] x,int[] v) ; 

} 

Fig. 1 . The Constraint Interface in Comet (Partial Description). 

Table 1 . The Semantic of Some Constraint Combinators. 



Combinator 


Violation Degrees 


Cl && C2 


v(ci) + V(C 2 ) 


Cl | c 2 


min(v(ci),t>(c 2 )) 


t(c) 


if t>(ci) > 0 then 1 else 0 

n 


exactly(fc, [ci, . . . ,c„]) 


abs(^~^ r(cj) — k) 

i = 1 


atmost(fc, [ci,...,c„]) 


n 

max(^ r(ci) — k, 0) 


k x c 


i = 1 

k x v (c) 


satisfactionConstraint (c) 


t(c) 



atecl quickly. The rest of this section describes modeling abstractions to combine 
constraints. Table 1 summarizes the violation degrees of the various combinators. 

Constraint Systems. Constraint systems, a fundamental modeling abstraction 
in Comet, are container objects representing a conjunction of constraints. Con- 
straint systems are constraints themselves and implement the Constraint in- 
terface. Hence they maintain their truth value and their violation degree, i.e. , 
the sum of the violation degrees of their constraints. They also support the dif- 
ferentiable API. Figure 2 depicts a simple Comet program for the n-queens 
problem. Lines 1-4 describe the problem variables and data structures, lines 5-9 
describe the modeling component, and lines 10-13 specify the search procedure. 
Line 3 creates a uniform distribution and line 4 declares the decision variables: 
variable queen [i] represents the row of the queen placed in column i. These in- 
cremental variables are initialized randomly using the uniform distribution. Line 
5 declares the constraint system and lines 6-8 specify the problem constraints 
using the ubiquitous allDiff erent constraint. More precisely, they specify that 
the queens cannot be placed on the same row, the same upper diagonal, and 
the same lower diagonal. Lines 10-13 describe a min-conflict search procedure 
[11], Line 11 selects the queen q with the most violations and line 12 chooses a 
new value v for queen q. Line 13 assigns this new value to the queen, which has 
the effect of (possibly) updating the violation degree of the subset of affected 
constraints and of the constraint system. Lines 11-13 are iterated until a solution 
is found. 
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1. range Size = 1..1024; 

2. LocalSolver m() ; 

3. Unif ormDistribution distr(Size) ; 

4. inc{int} queen[i in Size] (m.Size) := distr.getQ; 

5. ConstraintSystem S(m); 

6. S .post (allDif f erent (queen) ) ; 

7. S .post (allDiff erent (all(i in Size) queen[i] + i)); 

8. S .post (allDiff erent (all(i in Size) queen[i] - i)); 

9 . m. close () ; 

10. while (S . violations () > 0) 

11. selectMax(q in Size) (S . getViolations (queen [q] ) ) 

12. selectMin(v in Size) (S .getAssignDelta(queen[q] , v) ) 

13. queen [q] := v; 

Fig. 2. A Comet Program for the Queens Problem. 

Constraint systems provide a clean separation between the declarative and 
search components of a local search. Observe that it is possible to add new 
constraints to the constraint system without changing the search procedure. 
Similarly, it is possible to change the search procedure (e.g., adding a tabu list) 
without modifying the model. It is also important to stress that a single Comet 
program may use several constraint systems simultaneously. This is useful, for 
instance, for local search algorithms that maintain the feasibility of a subset of 
constraints (e.g., the hard constraints), while allowing others to be violated (e.g., 
the soft constraints). 

Logical and Cardinality Operators. One of the appealing features of constraint 
programming is its ability to combine constraints using logical and cardinality 
operators. Comet offers similar functionalities for local search. For instance 
Constraint c = ((x != y) II (x != z)); 

illustrates a disjunctive constraint in Comet. The disjunctive constraint is a 
differentiable object implementing the Constraint interface. In particular, the 
violation degree of a disjunction c = ci||c2 is given by mm(v(ci),v(c 2 )), where 
v(c) denotes the violation degree of c. 

Comet also features a variety of cardinality operators. For instance, Figure 
3 depicts the use of the cardinality operator exactly(fc, [ci, . . . , c n ]), a differ- 
entiable constraint which holds if exactly k constraints hold in c\, ... ,c n . The 
figure depicts a Comet program to solve the magic series problem, a traditional 
benchmark in constraint programming. A series (so, . . . , s n ) is magic if s, : rep- 
resents the number of occurrences of i in (so, . . . , s n ). Lines 6-10 in Figure 3 
specify the modeling component. Line 8 features the cardinality operator to ex- 
press that there are magic [v] occurrences of v in the magic series and line 9 
adds the traditional redundant constraint. Lines 11-19 implement a min-conflict 
search with a simple tabu-search component. Observe the modeling component 
in this program which is similar to a traditional constraint programming solution. 
Interestingly, local search performs reasonably well on this problem as indicated 
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1. int n = 400; 

2. range Size = 0..n-l; 

3. LocalSolver m() ; 

4. inc{int} magic [i in Size] (m.Size) := 0; 

5. int tabu[i in Size] = -1; 

6. ConstraintSystem S(m); 

7. forall(v in Size) 

8. S .post (exactly (magic [v] , all (i in Size) magic [i] == v) ) ; 

9. S.post(sum(i in Size) i * magic [i] == n) ; 

10 . m. close () ; 

1 1 . int it = 0 ; 

12. while (S . violations () > 0) { 

13. selectMax(s in Size: tabu[s] < it) (S . getViolations (magic [s] ) ) 

14. selectMin(v in Size: magicfs] != v) (S . get AssignDelta (magic [s] , v) ) { 

15. magic [s] := v; 

16. tabu[s] = it + 3; 

17. } 

18. it = it + 1; 

19. } 

Fig. 3. A Comet Program for the Magic Series Problem. 



Table 2. Performance Results on the Magic Series Program. 



n 


10 


30 


50 


70 


90 


110 


130 


150 


170 


190 


210 


best(T) 


0.00 


0.03 


0.09 


0.21 


0.41 


0.57 


0.84 


1.09 


1.44 


2.09 


3.20 


h(T) 


0.01 


0.09 


0.41 


1.05 


1.78 


5.70 


13.58 


21.70 


47.60 


67.83 


150.41 


worst(T) 


0.02 


0.37 


1.57 


7.35 10.85 


30.95 102.63 


86.54 347.77 400.20 


761.11 


a(T) 


0.01 


0.12 


0.56 


1.73 


2.84 


9.12 


22.87 


31.38 


81.80 110.71 


240.85 



in Table 2. The table gives the best, average, and worst times in seconds for 50 
runs on a 2.4Glrz Pentium, as well as the standard deviation. 

The contributions here are twofold. On the one hand, Comet naturally ac- 
commodates logical and cardinality operators as differentiable objects, allowing 
very similar modelings for constraint programming and local search. On the other 
hand, implementations of logical/cardinality operators directly exploit incremen- 
tal algorithms for the constraints they combine, providing compositionality both 
at the language and implementation level. The implementations can in fact be 
shown optimal in terms of the input/output incremental model [14], assuming 
optimality of the incremental algorithms for the composed constraints. 

Weighted Constraints. Many local search algorithms (e.g., [15, 4, 19]) use weights 
to focus the search on some subsets of constraints. Comet supports weight spec- 
ifications which can be either static or dynamic. (Dynamic weights vary during 
the search). Weights can be specified with the * operator. For instance the snip- 
pet 



Constraint c = 2 * allDif f erent (x) 
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1. LocalSolver m() ; 

2. Unif ormDistribution distr(Hosts) ; 

3. inc{int} boat [Guests , Periods] (m, Hosts) := distr.getO; 

4. int tabu [Guests .Periods .Hosts] = -1; 

5. ConstraintSystem S(m); 

6. forall(g in Guests) 

7. S.post(2 * allDifferent(all(p in Periods) boat[g,p])); 

8. forall(p in Periods) 

9. S.post(2 * knapsack(all(g in Guests) boat [g.p] , crew, cap) ) ; 

10. forall(i in Guests, j in Guests : j > i) 

11. S .post (atmost (1 , all(p in Periods) boat[i,p] == boat[j,p])); 

12 . m. close () ; 

Fig. 4. The Modeling Part of a Comet Program for the Progressive Party Problem. 
Table 3. Experimental Results for the Progressive Party Problem. 



Hosts/Periods 


6 


7 


8 


9 


10 


1-12,16 


0.32 (0.30) 


0.41 (0.35) 


0.60 (0.54) 


1.01 (0.69) 


3.74 (1.27) 


1-13 


0.41 (0.34) 


0.77 (0.46) 


3.15 (0.99) 


42.22 (5.80) 




1,3-13,19 


0.41 (0.33) 


0.79 (0.47) 


3.50 (0.92) 


28.6 (6.39) 




3-13,25,26 

1-11,19,21 

1-9,16-19 


0.44 (0.35) 
1.75 (0.54) 
2.81 (1.06) 


0.93 (0.50) 
36.1 (2.86) 
95.4 (4.81) 


4.53 (1.30) 


65.32 (7.79) 





associates a constant weight of 2 to an allDifferent constraint and returns 
an object implementing the Constraint interface. Its main effect is to modify 
the violation degree of the constraint and the results of its differentiable API. 
Weights can also be specified by incremental variables, which is useful in many 
applications where weights are updated after each local search iterations. This 
feature is illustrated later in the paper in a frequency allocation application. 

Figure 4 depicts the modeling component of a Comet program to solve the 
progressive party problem. It illustrates both constant weights and the cardi- 
nality operator atmost. Line 7 expresses weighted allDifferent constraints 
to ensure that a party never visits the same boat twice. Line 9 posts weighted 
knapsack constraints to satisfy the capacity constraint on the boats. Finally, line 
11 uses a cardinality constraint to specify that no two parties meet more than 
once over the course of the event. The cardinality operator makes for a very 
elegant modeling: it removes the need for the rather specific meetAtmostOnce 
constraint used in earlier version of the Comet program [9]. It also indicates 
the ubiquity of cardinality constraints for expressing, concisely and naturally, 
complex constraints arising in practical applications. 

Table 3 describes experimental results for this modeling and the search pro- 
cedure of [9] augmented with a restarting component that resets the current 
solution to a random configuration every 100,000 iterations in order to eliminate 
outlier runs. The table describes results for various configurations of hosts and 
various numbers of periods. The results report the average running times over 
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1. Satisf actionSystem S(m); 

2. forall(d in DistanceCtrs) 

3. switch (d.ty) { 

4. case 1: S .post (abs (f req[d. vl] -freq[d. v2] ) == d.rhs) ;break; 

5. case 2: S .post (abs (freq[d. vl] -freq[d. v2] ) > d.rhs) ;break; 

6. case 3: S .post (abs (freq[d. vl] -freq[d. v2] ) < d.rhs) ;break; 

7- } 

Fig. 5. Partial Contraint Satisfaction in Frequency Allocation. 



50 runs of the algorithm, as well as the best times in parenthesis. With the ad- 
dition of a restarting component, the standard deviation for the hard instances 
is always quite low (e.g., configuration 1-9,16-19 with 7 periods has a mean of 
95.4 and a deviation of 4.8) and shows that the algorithm’s behavior is quite 
robust. The results were obtained on a 2.4Ghz Pentium 4 running Linux. The 
performance of the program is excellent, as the cardinality operator does not 
impose any significant overhead. Once again, the implementation can be shown 
incrementally optimal if the underlying constraints support optimal algorithms. 

Partial Constraint Satisfaction. Some local search algorithms do not rely on 
violation degrees; rather they reason on the truth values of the constraints only. 
This is the case, for instance, in partial constraint satisfaction [3], where the 
objective is to minimize the (possibly weighted) number of constraint violations. 
Comet provides an operator to transform an arbitrary constraint into a satis- 
faction constraint, i.e., a constraint whose violation degree is 0 or 1 only. For 
instance, the snippet 

Constraint c = satisf actionConstraint (allDif f erent (x) ) ; 

assigns to c the satisfaction counterpart of the allDiff erent constraint. The key 
contribution here is the systematic derivation of the satisfaction implementation 
in terms of the original constraint interface. The resulting implementation only 
induces a small constant overhead. 

Comet also supports satisfaction systems that systematically apply the sat- 
isfaction operator to the constraints posted to them, simplifying the modeling 
and the declarative reading. Figure 5 illustrates satisfaction systems on an ex- 
cerpt from a frequency allocation problem, where the objective is to minimize the 
number of violated constraints. Line 1 declares the satisfaction system S, while 
lines 4, 5, and 6 post the various types of distance constraints to the system. 

3 Objective Functions 

We now turn to objective functions, another class of differentiable objects in 
Comet. Objective functions may be linear, nonlinear, or may capture combi- 
natorial substructures arising in many applications. A typical example is the 
objective function MinNbDistinct (x [1] , . . . ,x[n]) which minimizes the num- 
ber of distinct values in x[l] , . . . ,x[n] and arises in graph coloring, frequency 
allocation, and other resource allocation problems. 
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Interface Objective { 

inc{int} [] getVariablesO ; 

inc { int } value O ; 

inc{int} cost(); 

int getCost (inc{int} x) ; 

int getAssignDelta(inc{int} x,int v) ; 

int getSwapDelta(inc{int} xl,inc{int} x2) ; 

int getAssignDelta(inc{int} [] x,int[] v) ; 

} 

Fig. 6. The Objective Interface in Comet (Partial Description). 

Table 4. The Semantic of Some Objective Functions. 



Objective 


Value 


Cost 


MinNbDistinct (x[E] ) 


#{ x[e] | e <= E } 


-^|{ee E\x[e] = i}\ 2 


MinNbDistinct(x[E],w[E]) 


#{ x[e] e <E E } 


i 






eeE 


MaxNbDistinct(x[E],w[V]) 


#{ x[e] | e <= E } 


X 






V:->OCClir(t),a;) 


condSum(b[E], c[E]) 


X c[e] 


X c t e i 




e£E:b[e] 


e£E:b[e] 


min Assignment) b [V], c[E , V]) 
constraint AsObjective(c) 


> max c\e, v] 

veV:b[v\ 

e£E 

v(c) 


> max c[e, v] 
*-^veV:b[v] 
eeE 

v(c) 



This section illustrates how objective functions, like constraints, can be com- 
bined naturally to build more complex objectives. Once again, this desirable 
functionality comes from the Objective interface depicted in Figure 6 and im- 
plemented by all objective functions. In particular, the interface gives access to 
their variables, to two incremental variables, and the differentiable API. The 
first incremental variable (available through method value) maintains the value 
of the function incrementally. The second variable (available through method 
cost) maintains the cost of the function which may, or may not, differ from its 
value. The cost is useful to guide the local search more precisely by distinguish- 
ing states which have the same objective value. Consider again the objective 
function nbDistinct. Two solutions may use the same number of values, yet 
one of them may be closer than the other to a solution with fewer values. To 
distinguish between these, the cost may favor solutions where some values are 
heavily used while others have few occurrences. For instance, such a cost is used 
for graph coloring in [6] and is shown in Table 4 that summarizes the objective 
functions discussed in this paper. The differentiable API returns the variation 
of the cost induced by assignments and swaps. The method getCost also re- 
turns the contribution of a variable to the cost and is the counterpart of method 
getViolations for objective functions. 
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Arithmetic Operators. Objective functions can be combined using traditional 
arithmetic operators. For instance, the excerpt 

Objective f = condSum( open, fixed) + minAssignment(open,transportionCost) ; 

illustrates the addition of two objective functions capturing combinatorial sub- 
structures expressing the fixed and transportation costs in uncapacitated ware- 
house location. These functions are useful in a variety of other applications such 
as k-median and configuration problems. See also [10] for a discussion of efficient 
incremental algorithms to implement them. 

Reification: Constraints as Objective Functions. Some of the most challenging 
applications in combinatorial optimization feature complex feasibility constraints 
together with a “global” objective function. Some algorithms approach these 
problems by relaxing some constraints and integrating them in the objective 
function. To model such local search algorithms, Comet provides the generic 
operator constraint AsObjective that transforms a constraint into an objec- 
tive function whose value and cost are the violation degree of the constraint. 
Moreover, traditional arithmetic operators transparently apply this operator to 
simplify the declarative reading of the model. The implementation of this com- 
binator, which is also given in Table 4, is in fact trivial, since it only maps one 
interface in terms of the other. 

Frequency Allocation. To illustrate objective functions, consider a frequency al- 
location problem where feasibility constraints impose some distance constraints 
on frequencies and where the objective function consists of minimizing the num- 
ber of frequencies used in the solution. We present an elegant Comet program 
implementing the Guided Local Search (GLS) algorithm proposed in [19] 1 . The 
key idea underlying the GLS algorithm is to iterate a simple local search where 
feasibility constraints are integrated inside the objective function. After com- 
pletion of each local search phase, the weights of the violated constraints are 
updated to guide the search toward feasible solutions. If the solution is feasi- 
ble, the weights of some frequencies used in the solutions (e.g., the values that 
occur the least) are increased to guide the search toward solutions with fewer 
frequencies. 

Figure 7 depicts the modeling part of the GLS algorithm. Line 2 declares the 
decision variables, while lines 3 and 4 create the incremental variables represent- 
ing the weights that are associated with constraints and values respectively. Line 
5 declares the satisfaction system S and lines 6-18 post the distance constraints 
in S, as presented earlier in the paper. The only difference is the use of weights 
which are useful to focus the search on violated constraints. Line 19 declares the 
objective function nbFreq which minimizes the number of distinct frequencies. 
Note that this function receives, as input, a dynamic weight for each value to 
guide the search toward solutions with few frequencies. Line 20 is particularly 

1 The point is not to present the most efficient algorithm for this application, but to 
illustrate the concepts introduced herein on an interesting algorithm/application. 
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1. 


LocalSolver m() ; 




2. 


incjint} freqfRV] (m) ; 




3. 


incjint} w[RC](m) := 0; 




4. 


incjint} fw[RF](m): = 0; 




5. 


Satisf actionSystem S(m); 




6. 


forall(d in DistanceCtrs) 




7. 


switch (d.ty) { 




8. 


case 1 : S .post (abs (f reqfd. vl] -freqfd. v2] ) 


== d. rhs ,w [d. id] ); break; 


9. 


case 2 : S .post (abs (freqfd. vl] -freqfd. v2] ) 


> d. rhs, w [d. id] ); break; 


10. 


case 3 : S .post (abs (freqfd. vl] -freq [d. v2] ) 


< d. rhs, w fd. id] ); break; 


11. 


} 




12. 


MinNbDistinct nbFreq(freq,fw) ; 




13. 


Objective obj = S + nbFreq; 




14. 


m. close () ; 





Fig. 7. The Modeling Part of a Comet Program for Frequency Allocation. 



Table 5. Experimental Results for Frequency Allocation, 
id a(S) B(S) W(S) n(Q) a(Q) B(O) W(O) /n(J) a (I) 



1 


18.76 


2.15 


16 


24 


4.40 


2.50 


1.84 


10.35 


778.60 


420.02 


2 


14.00 


0.00 


14 


14 


0.69 


1.06 


0.27 


6.20 


152.72 


257.28 


3 


15.36 


1.23 


14 


18 


2.69 


1.96 


0.69 


9.10 


523.40 


383.69 



interesting: it defines the GLS objective function obj as the sum of the satis- 
faction system (a constraint viewed as an objective function) and the “actual” 
objective nbFreq. Of course, the search component uses the objective obj. 

Figure 8 depicts the search part of the algorithm. Function GLS iterates two 
steps for a number of iterations: a local search, depicted in lines 8-20, and the 
weight adjustment. The weight adjustment is not difficult: it simply increases 
the weights of violated constraints and, if the solution is feasible, the weights of 
the frequencies that are used the least in the solution (modulo a normalization 
factor [19]). The local search considers all variables in a round-robin fashion and 
applies function moveBest on each of them, until a round does not improve the 
objective function (lines 8-15). Function moveBest selects, for variable freq[v] , 
the frequency f that minimizes the values of the objective function (ties are 
broken randomly). It uses the differentiable API to evaluate moves quickly. 

It is particularly interesting to observe the simplicity and elegance of the 
Comet program. The modeling component simply specifies the constraints and 
the objective function, and combines them to obtain the GLS objective. The 
search component is expressed at a high level of abstraction as well, only relying 
on the objective function and the decision variables. Table 5 depicts the exper- 
imental results on the first three instances of the Celar benchmarks. It reports 
quality and efficiency results for 50 runs of the algorithms on a 2.4GHz Pentium 
IV. In particular, it gives the average, standard deviation, and the best and 
worst values for the number of frequencies and the time to the best solutions 
in seconds. It also reports the average number of iterations and its standard 
deviation. The preliminary results indicate that the resulting Comet program, 
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1. function void GLS() { 

2. while (it < maxlterations) { 

3. localSearchO ; 

4. updateWeightsO ; 

5 . it++ ; 

6 . } 

7- } 

8. function void localSearchO { 

9. int old; 

10. do { 

11. old = obj.costO; 

12. forall(v in RV) moveBest(v); 

13. } while (old != obj.costO); 

14. } 

15. function void moveBest(int v) { 

16. if (obj ,getCost(freq[v] ) > 0) 

17. selectMin(f in Domainfv] ) (obj . getAssignDelta(f req[v] ,f ) ) 

18. freq[v] := f; 

19. } 

Fig. 8. The Search Part of a Comet Program for Frequency Allocation. 



despite its simplicity and elegance, compares well in quality and efficiency with 
specialized implementations in [19]. Note that there are many opportunities for 
improvements to the algorithm. 

4 First-Class Expressions 

We now turn to first-class expression, another significant abstraction which is 
also an integral part of constraint programming libraries [5]. First-class expres- 
sions are constructed from incremental variables, constants, and arithmetic, log- 
ical, and relational operators. In Comet, first-class expressions are differentiable 
objects which can be evaluated to determine the effect of assignments and swaps 
on their values. In fact, several examples presented earlier feature first-class ex- 
pressions. For instance, the Comet code 
S .post (allDif f erent (all (i in Size) queenli] + i)); 
from the n-queens problem can be viewed as a shortcut for 
exprjint} d[i in Size] = queenli] + i; 

S .post (allDiff erent (d) ) ; 

The first instruction declares an array of first-class integer expressions, element 
d[i] being the expression queenli] + i. The second instruction states the 
allDifferent constraint on the expression array. As mentioned earlier, ex- 
pressions are differentiable objects, which can be queried to evaluate the ef- 
fect of assignments and swaps on their values. For instance, the method call 
d[i] . getAssignDelta(queen [i] ,5) returns the variation in the value of ex- 
pression d[i] when queenli] is assigned the value 5. 
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1 . int n = 40 ; 

2. range Size = l..n; 

3. range Domain = 0..n-l; 

4. range SD = l..n-l; 

5. LocalSolver m() ; 

6. RandomPermutation perm (Domain) ; 

7. inc{int} v [Size] (m, Domain) := perm.getQ; 

8. MaxNbDistinct obj(all(k in SD) abs (v [k+1] -v [k] ) ) ; 

9 . m. close () ; 

10. while (obj.valueO < n-1) 

11. select (i in Size: obj . getCost (v [i] ) > 0) 

12. selectMax(j in Size: j != i) (obj .getSwapDelta(v[i] ,v[j] )) 

13. v [i] : = : v[j]; 

Fig. 9. A Comet Program for the All-Interval Series Problem. 

First-class expressions significantly increase the modeling power of the lan- 
guage, since constraints and objective functions can now be defined over complex 
expressions, not incremental variables only. Moreover, efficient implementations 
of these enhanced versions can be obtained systematically by combining the dif- 
ferentiable APIs of constraints, objective functions, and first-class expressions. 
For instance, in the queens problem, the implementation can be thought of as 

(1) defining an intermediary set of incremental variables 

inc{int} q[i in Size] <- queen [i] + i; 

(2) specifying an allDif f erent (q) constraint on these variables and (3) imple- 
menting the differentiable API as a composition of the differentiable APIs of the 
expressions queen [i] + i and of the allDiff erent constraint. 

The all-interval series problem [1] illustrates the richness of first-class expres- 
sions in Comet. (The n-queens problem only illustrates a simple use of first-class 
expressions, since every variable occurs in exactly one expression.) The problem, 
is a well-known exercise in music composition. It consists of finding a sequence 
of notes such that all notes in the sequence, as well as tonal intervals between 
consecutive notes, are different. The all-interval series problem can thus be mod- 
eled as the finding of a permutation of the first n integers such that the absolute 
difference between two consecutive pairs of numbers are all different. 

Figure 9 depicts a Comet program solving the all-interval series problem. 
The basic idea behind the modeling is to maximize the number of different 
distances in abs (v [2] -v [1] ),..., abs (v [n] -v [n-1] ) . Line 7 declares the vari- 
ables v [i] in the series, which are initialized to a random permutation of 0..n— 1. 
This guarantees that all variables have distinct values, a property maintained by 
the search procedure whose local moves swap the values of two variables. Line 8 
specifies the objective function MaxNbDistinct which maximizes the number of 
distinct distances in abs(v[2] -v[l] ) , . . . ,abs(v[n] -v[n-l] ). It is important 
to observe that almost all variables occur in two expressions. As a consequence, 
it is non-trivial to evaluate the impact of a swap on the objective function, since 
this may involve up to f specific expressions. The Comet implementation ab- 
stracts this tedious and error-prone aspect of the local search through first-class 
expressions and the combinatorial function MaxNbDistinct. 
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Table 6. Performance Results on the All-Interval Series Program. 



n 


/j,( Time ) best(Time ) 


worst(Time) 


a(Time) 


fj,(Iter) 


u {Iter) 


10 


0.00 


0.00 


0.01 


0.00 


25.18 


38.27 


15 


0.01 


0.00 


0.06 


0.02 


203.92 


276.25 


20 


0.04 


0.00 


0.26 


0.07 


859.02 


1235.77 


25 


0.07 


0.00 


0.52 


0.10 


1183.30 


1419.42 


30 


0.39 


0.01 


2.41 


0.64 


6092.78 


7985.98 


35 


0.86 


0.03 


6.75 


1.49 


11864.52 


16792.77 


40 


2.62 


0.09 


18.68 


4.05 


32669.46 


38641.32 


45 


6.63 


0.12 


36.79 


9.24 


78961.70 


75349.35 


50 


34.09 


1.27 


165.41 


52.56 


355816.56 


416601.45 


55 


130.05 


9.30 


278.75 


160.56 1277633.38 


906189.43 



The performance of the algorithm can be improved upon by associating 
weights to the distances (as suggested in [1]). The justification here is that larger 
distances are much more difficult to obtain and it is beneficial to bias the search 
toward them. To accommodate this enhancement, it suffices to replace line 8 by 
MaxNbDistinct obj(all(k in SD) abs (v [k+1] -v [k] ) , all(k in SD) k~3) ; 

Table 6 gives the experimental results for various values of n and for the Comet 
program, extended to restart the computation after 1,000 iterations if no solution 
was found. The resulting Comet significantly outperforms existing programs. 
The Java implementation from [1] takes an average time of 63 seconds for n = 30 
on a 733MHz Pentium III (instead of 0.39 seconds for Comet on a 2.4GHz 
machine). The gain probably comes from the better incrementality of Comet 
which uses differentiation to evaluate moves quickly. Indeed, our first Comet 
program for this problem, which did not use first-class expressions and did not 
support the differentiable API for the objective function, took 10.44 seconds 
in average over 50 runs for n = 30. The incrementality of the Comet program 
presented here is obtained directly from the composition of differentiable objects, 
while it is tedious to derive manually. 



5 Generic Search Procedures 

The combinators presented in this paper have an additional benefit: they provide 
the foundation for writing generic search procedures in Comet. Indeed, search 
procedures are now able to interact with declarative components only through 
the Constraint and Objective interfaces, abstracting away the actual details 
of constraints and objective functions. These generic search procedures do not 
depend on the application at hand, yet they exploit the differentiable APIs to 
implement heuristics and meta-lreuristics efficiently. Moreover, these APIs are 
implemented by efficient incremental algorithms exploiting the structure of the 
applications. In other words, although the search procedures are generic and 
do not refer to specificities of the applications, they exploit their underlying 
structure through the combinators. 
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1. function void minConf lictSearch(ConstraintSystem S) { 

2. inc{int} [] var = S.getVariablesO ; 

3. range Size = var . getRange () ; 

4. while (S . violations O > 0) 

5. selectMax(i in Size) (S .getViolations (var [i] ) ) 

6. selectMin(v in var [i] . getDomainO ) (S .getAssignDelta(var [i] , v) ) 

7. var [i] := v; 

8 . } 

Fig. 10. A Generic Min-Conflict Search in Comet. 

1. function int cdSearch(ConstraintSystem S) { 

2. Unif ormDistribution noise(0 . . 99) ; 

3. range C = S . getRange () ; 

4. Constraint c [i in C] = S . getConstraint (i) ; 

5. while (S . violations 0 > 0) 

6. select(i in C: ! c [i] . isTrueO ) { 

7. inc{int}[] var = c [i] .getVariablesO ; 

8. range RV = var . getRange 0 ; 

9. selectMax(v in RV) (c [i] . getViolations (var [v] ) ) 

10. selectMin(val in var [v] .getDomainO ) 

(c[i] . getAssignDelta(var [v] ,val)) 

11. if (S . getAssignDelta(var [v] , val) < 0) 

12. var [v] := val; 

13. else if (noise. get () < 10) { 

14. var [v] := val; 

15. } 

16. } 



Fig. 11. A Generic Constraint-Directed Search in Comet. 

Figure 10 depicts a min-conflict search in Comet. The code is essentially 
similar to the search procedure in the queens problem: it is only necessary to 
collect the variable array and its range in lines 2-3, and to use the variable do- 
mains in line 6. As a consequence, lines 10-13 in the queens problem can simply 
be replaced by a call minConf lictSearch(S) . Similarly, Figure 11 implements 
a constraint-directed search inspired by search procedures used in WSAT [15, 
20] and DragonBreath [13]. The key idea is to select a violated constraint 
first (line 6) and then a variable to re-assign (line 9). Once again, lines 10-13 
in the queens problem can simply be replaced by a call cdSearch(S). Such 
constraint-oriented local search procedures are particularly effective on a variety 
of problems, such as the ACC sport-scheduling problem. The actual search pro- 
cedure in [20] was also successfully implemented in Comet and applied to their 
integer formulation. 

Observe that these search procedures are generic and do not depend on the 
actual shape of the constraints, which can be propositional, linear, nonlinear, 
combinatorial, or any combination of these. Moreover, generic search procedures 
bring another interesting benefit of constraint programming to local search: the 
ability to provide a variety of (parameterized) default search procedures, while 
exploiting the specific structure of the application. 
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6 Conclusion 

This paper aimed at demonstrating that constraint-based combinators from con- 
straint programming are natural abstractions for expressing local search algo- 
rithms. In particular, it showed that logical and cardinality operators, reification, 
and first-class expressions can all be viewed as differentiable objects encapsulat- 
ing efficient incremental algorithms. These combinators, and their counterparts 
for objective functions, provide high-level ways of expressing complex ad-hoc 
constraints, generic search procedures, and partial constraint satisfaction. They 
were also shown to be amenable to efficient implementations. As a consequence, 
this paper, together with earlier results, indicates that the rich language of con- 
straint programming is a natural vehicle for writing a wide variety of modular, 
extensible, and efficient local search programs. 
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Abstract. Scheduling is one of the most successful application areas 
of constraint programming mainly thanks to special global constraints 
designed to model resource restrictions. Among these global constraints, 
edge-finding filtering algorithm for unary resources is one of the most 
popular techniques. In this paper we propose a new 0(n log n) version of 
the edge-finding algorithm that uses a special data structure called (9- 
A-tree. This data structure is especially designed for “what-if’ reasoning 
about a set of activities so we also propose to use it for handling so 
called optional activities, i.e. activities which may or may not appear 
on the resource. In particular, we propose new 0(n log n) variants of 
filtering algorithms which are able to handle optional activities: overload 
checking, detectable precedences and not-first/not-last. 



1 Introduction 

In scheduling, a unary resource is an often used generalization of a machine (or 
a job in openslrop) . A unary resource models a set of non-interruptible activities 
T which must not overlap in a schedule. 

Each activity i £ T has the following requirements: 

— earliest possible starting time est, 

— latest possible completion time let, 

— processing time p^ 

A (sub)problem is to find a schedule satisfying all these requirements. One 
of the most used techniques to solve this problem is constraint programming. 

In constraint programming, we associate a unary resource constraint with 
each unary resource. A purpose of such a constraint is to reduce a search space by 
tightening the time bounds est, and let*. This process of elimination of infeasible 
values is called propagation , an actual propagation algorithm is often called a 
filtering algorithm. 

Naturally, it is not efficient to remove all infeasible values. Instead, it is 
customary to use several fast but not complete algorithms which can find only 
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some of impossible assignments. These filtering algorithms are repeated in every 
node of a search tree, therefore their speed and filtering power are crucial. 

Filtering algorithms considered in this paper are: 

Edge-finding. Paper [5] presents 0(n log n) version, another two 0(n 2 ) ver- 
sions of edge-finding can be found in [7, 8] . 

Not-first/not-last. O(nlogn) version of the algorithm can be found in [10], 
two older 0(n 2 ) versions are in [2, 9]. 

Detectable precedences. This 0(nlogn) algorithm was introduced in [10]. 
All these filtering algorithms can be used together to join their filtering powers. 

This paper introduces new version of the edge-finding algorithm with time 
complexity O(rclogn). Experimental results shows that this new edge-finding 
algorithm is faster than the quadratic algorithms [7, 8] even for n = 15. Another 
asset of the algorithm is the introduction of the 0-A-tree - a data structure 
which can be used to extend filtering algorithms to handle optional activities. 

2 Edge-Finding Using 0-yl-Tree 

2.1 Basic Notation 

Let us establish the basic notation concerning a subset of activities. Let T be 
a set of all activities on the resource and let 0 C T be an arbitrary non-empty 
subset of activities. An earliest starting time este, a latest completion time lctg> 
and a processing time p e of the set 0 are defined as: 

est© = min{estj, j G 0} 
lcte = max {let j, j £ 0} 

Pe = H Pj 

Often we need to estimate an earliest completion time of a set 0: 

ECTe = max {este' +Peo (1) 

To extend the definitions also for 0 = 0 let estg = — oo, lctg = oo, pg = 0 and 
ECT 0 = -oo. 

2.2 Edge-Finding Rules 

Edge-finding is probably the most often used filtering algorithm for a unary 
resource constraint. Let us recall classical edge-finding rules [2]. Consider a set 
Q C T and an activity i ^ Q. If the following condition holds, then the activity 
i has to be scheduled after all activities from fl\ 

wn C T, V* e (T \ f2) : 

est fiu{i} + p^upj = min {est fi , est*} + p fi + P; > lct fi => Q < i (2) 
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Once we know that the activity i must be scheduled after the set 17, we can 
adjust est-i : 

fi <C i => est, := max{estj, ECT^} (3) 

Edge-finding algorithm propagates according to this rule and its symmet- 
ric version. There are several implementations of edge-finding algorithm, two 
different quadratic algorithms can be found in [7, 8], [5] presents a 0{n log n) 
algorithm. 

Proposition 1. Let (9(j) = {fc, k G T & lctfc < let y } . The rules (2), (3) are 
not stronger than the following rule: 

VjGT, Vz G T \ 0(j) : 

ECTe(j) U { i } > letj =>• O(j) -C i => est* := max{esf;, ECT© (i )} (4) 

Actually, the rules (2) and (3) are equivalent with the rule (4). However, the proof of 
their equivalence (the reverse implication) is rather technical and therefore it is not 
included in the main body of this paper. An interested reader can find this proof in 
the appendix of this paper. 

Proof. Let us consider a set 17 C T and an activity z G T \ i 7. Let j be one of the 
activities from 17 for which let 3 = lct^. Thanks to this definition of j we have 
17 C S(j) and so (recall the definition (1) of ECT): 

estr 2 U{i} +p fiu{i} = min{est fi , estj + p fi +p, < ECT e(j)u{i} 

ECTfi < ECT e(i) 

Thus: when the original rule (2) holds for 17 and i, then the new rule (4) holds 
for O(j) and i too, and the change of estj is at least the same as the change by 
the rule (3). □ 

Property 1. The rule (4) has a very useful property. Let us consider an activity 
i and two different activities j i and j ’2 for which the rule (4) holds. Moreover 
let lcty < let J2 . Then 0(j\) C 0(y 2 ) and so ECT©^) < ECT©y 2 ), therefore j 2 
yields better propagation then j\. Thus for a given activity i it is sufficient to 
look for the activity j for which (4) holds and lct ; is maximum. 



2.3 0-yl-Tree 

A 0-A - tree is an extension of a 0-tree introduced in [10]. <9-tree is a data 
structure designed to represent a set of activities <9 C T and to quickly compute 
ECT©. 0-tree was already successfully used to speed up two filtering algorithms 
for unary resource: not-first/not-last and detectable precedences [10]. 

In a (9-tree, activities are represented as nodes in a balanced binary search 
tree with respect to est. In the following we will not make a difference between 
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an activity and the tree node which represents that activity. Besides an activity 
itself, each node k of a <9-tree holds the following two values: 

SP fc = E Pj 

< 7 'GSubtree(fc) 

ECT fc =ECT Subtree(fc) = max{est 0 / +p e ,, O' C Subtree(fc)} 

where Subtree(fc) is a set of all activities in a subtree rooted at node k (including 
activity k itself). The values EP^ and ECTfc can be computed recursively from 
the direct descendants of the node (for more details see [10]): 

EPfc EPi e ft(k) T Pk + EP r ^g b ^^j (5) 

ECT fc = max { ECT right(/s) , (6) 

estfc + Pfc + SP r ight(fc) 5 
ECTi e f^^^ + p& -)- EP r jg b t^^ } 




Fig. 1 . An example of a (9-tree for & = {a, b, c, d, e, f, g}. 



Let us now consider alternative edge-finding rule (4) . We choose an arbitrary 
activity j and now we want to check the rule (4) for each applicable activity i, 
i.e. we would like to find all activities i for which the following condition holds: 

ECT^Q-^yp} > lctj 

Unfortunately, such an algorithm would be too slow: before the check can be 
performed, each particular activity i must be added into the 0-tree, and after 
the check the activity i have to be removed back from the (9-tree. 

The idea how to surpass this problem is to extend the 0-tree structure the 
following way: all applicable activities i will be also included in the tree, but as 
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a gray nodes. A gray node represents an activity i which is not really in the set 
0. However, we are curious what would happen with ECT© if we are allowed to 
include one of the gray activities into the set 0. More exactly: let A C T be a 
set of all gray activities, A D 0 = 0. The purpose of the 6>-A-tree is to compute 
the following value: 

ECT(6>, A) = max{{ECT©}U {ECT© u{i} , i G A}} 

The meaning of the values ECT and EP in the new tree remains the same, 
however only regular {white) nodes are taken into account. Moreover, in order 
to compute ECT(<9, A) quickly, we add the following two values into each node 
of the tree: 

EPfc = max{p e /, O' C Subtree(fc) & \0' n A\ < 1} 

= max {{0} U {p 4 , i G Subtree(fc) fl A}} + ^ p 4 

i£Subtree(fc)n<9 

ECT fc = ECT Subtree ( fe ) = maxjest©/ +p ©,, O' C Subtree(fc) & \0' fl A\ < 1} 

EP is maximum sum of processing activities in a subtree if one of gray activities 
can be used. Similarly ECT is an earliest completion time of a subtree with at 
most one gray activity included. 

An idea how to compute values EP;,. and ECT*, in node k follows. A gray 
activity can be used only once. Therefore when computing EP^ and ECT^, a 
gray activity can be used only in one of the following places: in the left subtree 
of k, by the activity k itself (if it is gray), or in the right subtree of k. Note that 
the gray activity used for SP*, can be different from the gray activity used for 
ECTfc. The formulae (5) and (6) can be modified to handle gray nodes. 

We distinguish two cases: node k is gray or node k is white. When k is white 
then: 



max | ^]Pleft(fc) “1“ Pfc ^Pright(fc) 5 




^Pleft(fc) “1“ Pfc “1“ ^Pright(fc) } 




ECT fc = max { ECT right (/ s ) , 


(a) 


estfc + p fc + EP r ight(fc) j 


(b) 


T P k ~b ^Pright(fc)i 


(c) 


PCTi e ft(A;) PA; ^Pright(fc) } 


(c) 



Line (a) considers all sets O' such that O' C Subtree(riglrt(fc)) (see the defi- 
nition (1) of ECT on page 63). Line (b) considers all sets O' such that O' C 
Subtree(riglrt(fc)) U {k} and k G O' . Finally lines (c) consider sets O' such that 
0' fl Subtree(left(/c)) ^ 0. 

When k is gray then (the meaning of the labels (a), (b) and (c) remains the 
same): 
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SPfc — max | 4- , 

^Pleft(fc) “1“ Pfe “1“ ^Pright(fc)? 

^Pleft(fc) “1“ ^Pright(fc) } 

ECT fc = max { ECT r i gh t( fc ), (a) 

estfc -j- p^, + EP r i g ht(fe) 7 (b) 

ECXieft(fe) + ^P r ight(fc)j (c) 

P^Tleft(fc) “1“ P k "P -^Pright(fc) ? (P) 

ECTieft(fe) + ^Pright(fc) } ( c ) 




Fig. 2. An example of a 0-A-tree for 0 = {a, c, d, e, /} and A = { b , g}. 



Thanks to these recursive formulae, ECT and EP can be computed within 
usual operations with balanced binary trees without changing their time com- 
plexities. Note that together with ECT we can compute for each node k the gray 
activity responsible for ECT*,. We need to know such responsible gray activity 
in the following algorithms. 

Table 1 shows time complexities of some operations on 0-A-tree. 



2.4 Edge-Finding Algorithm 

The algorithm starts with 0 = T and 4 = 0. Activities are sequentially (in 
descending order by let,) moved from the set 0 into the set A, i.e. white nodes 
are discolored to gray. As soon as ECT(0, A) > let©, a responsible gray activity 
i is updated. Thanks to the property 1 (page 64) the activity i cannot be updated 
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Table 1 . Time complexities of operations on 0-yl-tree. 



Operation 


Time Complexity 


(0, A) := (0, 0) 


0(1) 


(0, A) := (T, 0) 


0(n log n) 


(0, A) := (0\{t>, 4U{«}) 


O(logn) 


0 := 0 U {*} 


O(logn) 


A := A(j{i} 


O(logn) 


A:=A\{i} 


O(logn) 


ECT(0, A) 


0(1) 


ECT© 


0(1) 



better, therefore we can remove the activity i from the tree (i.e. remove it from 
the set A). 

1 for i £ T do 

2 est' : = est j ; 

3 (0, A) : = (T, 0); 

4 Q : = queue of all activities j € T in descending order of let j ; 

5 j := Q. first; 

6 repeat 

7 (0, A) := Au{j})-, 

8 Q. dequeue; 

9 j := Q. first; 

10 if ECT© > lct ; then 

11 fail; { Resource is overloaded} 

12 while ECT(0, A) > \ctj do begin 

13 i := gray activity responsible for ECT(0, A)\ 

14 est' := max{estj, ECT©}; 

15 A := A\{i}; 

16 end; 

17 until Q.size = 0; 

18 for i £ T do 

19 est,; : = est' ; 

Note that at line 13 there have to be some gray activity responsible for 
ECT(0, A) because otherwise we would end up by fail on line 11. 

During the entire run of the algorithm, maximum number of iterations of 
the inner while loop is n, because each iteration removes an activity from the 
set A. Similarly, number of iterations of the repeat loop is n, because each time 
an activity is removed from the queue Q. According to table 1 time complexity 
of each single line within the loops is O(logn) maximum. Therefore the time 
complexity of the whole algorithm is O(nlogn). 

Note that at the beginning 0 = T and A = 0, hence there are no gray 
activities and therefore ECT*, = ECT*, and EP^ = EP^ for each node k. Hence 
we can save some time by building the initial 0-A-tree as a “normal” 0-tree. 
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3 Optional Activities 

Nowadays, many practical scheduling problems have to deal with alternatives 
- activities which can choose their resource, or activities which exist only if a 
particular alternative of processing is chosen. From the resource point of view, 
it is not yet decided whether such activities will be processed or not. Therefore 
we will call such activities optional. For an optional activity, we would like to 
speculate what would happen if the activity actually would be processed by the 
resource. 

Traditionally, resource constraints are not designed to handle optional activ- 
ities properly. However, several different modifications are used to model them: 

Dummy activities. It is basically a workaround for constraint solvers which 
do not allow to add more activities on the resource during problem solving 
(i.e. resource constraint is not dynamic [3]). Processing time of activities is 
turned from constants to domain variables. Several “dummy” activities with 
processing time domain (0, oo) are added on the resource as a reserve for 
possible activity addition. Filtering algorithms work as usual, but they use 
minimum of possible processing time instead of original constant processing 
time. Note that dummy activities have no influence on other activities on 
the resource, because their processing time can be zero. Once an alternative 
is chosen, a dummy activity is turned into regular activity (i.e. minimum of 
processing time is no longer zero). In this approach, an impossibility of an 
alternative cannot be found before that alternative is actually tried. 
Filtering of options. The idea is to run a filtering algorithm several times, 
each time with one of the optional activities added on the resource. When a 
fail is found, then the optional activity is rejected. Otherwise time bounds 
of the optional activity can be adjusted. [4] introduces so called PEX-edge- 
finding with time complexity 0(n 3 ). This is a pretty strong propagation, 
however rather time consuming. 

Modified filtering algorithms. Regular and optional activities are treated 
differently: optional activities do not influence any other activity on the re- 
source, however regular activities influence other regular activities and also 
optional activities [6]. Most of the filtering algorithms can be modified this 
way without changing their time complexities. However, this approach is 
a little bit weaker than the previous one, because previous approach also 
checked whether the addition of a optional activity would not cause an im- 
mediate fail. 

Cumulative resources. If we have a set of similar alternative machines, this 
set can be modeled as a cumulative resource. This additional (redundant) 
constraint can improve the propagation before activities are distributed be- 
tween the machines. There is also a special filtering algorithm [11] designed 
to handle this type of alternatives. 

To handle optional activities we extend each activity i by a variable called 
existence; with the domain {true, false}. When existence; = true then i is a 
regular activity, when existence; £ {true, false} then i is an optional activity. 
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Finally when existence = false we simply exclude this activity from all our 
considerations. 

To make the notation concerning optional activities easy, let R be the set of 
all regular activities and O the set of all optional activities. 

For optional activities, we would like to consider the following issues: 

1. If an optional activity should be processed by the resource (i.e. if an optional 
activity is changed to a regular activity), would the resource be overloaded? 
The resource is overloaded if there is such a set Q C R that: 

let ft — estft < p^ 

Certainly, if a resource is overloaded then the problem has no solution. Hence 
if an addition of a optional activity i results in overloading then we can 
conclude that existence, = false. 

2. If the addition of an optional activity i does not result in overloading, what 
is the earliest possible start time and the latest possible completion time of 
the activity i with respect to regular activities on the resource? We would 
like to apply usual filtering algorithms for the activity i, however the activity 
i cannot cause change of any regular activity. 

3. If we add an optional activity i, will the first run of a filtering algorithm 
result in a fail? For example algorithm detectable precedences can increase 
estfc of some activity k so much that est^ +p fc > let*,. In that case we can 
also propagate existence, = false. 

We will consider the item 1 in the next section “Overload Checking with Optional 
Activities”. Items 2 and 3 are discussed in section “Filtering with Optional 
Activities” . 

4 Overload Checking with Optional Activities 

Let us consider an arbitrary set Q C R of regular activities. Overload rule says 
that if the set L? cannot be processed within its time bounds then no solution 
exists: 

let ft -estft < p a => fail 

Let us suppose for a while that we are given an activity i £ T and we want to 
check this rule only for those sets Q C T which have let ft = let,. Now consider 
a set <9: 

& = {j, j € R & lcty < let,} 

Overloaded set fl with let ft = let, exists if and only if ECT© > let,; = let©. The 
idea of an algorithm is to gradually increase the set 0 by increasing the let©. 
For each let© we check whether ECT© > let© or not. 

But what about optional activities? Let A be the following set: 



4 = {j, j G O & letj < let,} 
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An optional activity can cause overloading if and only if ECT(0, A) > let,. The 
following algorithm is an extension of the algorithm presented in [10]. Optional 
activities are represented by gray nodes in the 0-A-tree. 

The following algorithm deletes all optional activities k such that an addition 
of each activity k alone causes an overload. Of course, a combination of several 
optional activities that are not deleted may still cause an overload! 

(0, A) := (0, 0); 

for i C T in ascending order of let, do begin 
if i is a regular activity then begin 
0 : = 0 U {i} ; 
if ECTe > letj then 

fa i 1 ; { No solution exists} 
end else 

A : = AU{i}; 

while ECT(0, A) > let, do begin 

k := optional activity responsible for ECT(0, A); 
existence fc : = false ; 

A := A\{fc}; 

end ; 
end ; 

The complexity of the algorithm is again O(nlogn). The inner while loop is 
repeated n times maximum because each time an activity is removed from the 
set A. Outer for loop has also n iterations, time complexity of each single line is 
O(logn) maximum (see the table 1). 

5 Filtering with Optional Activities 

The following section is an example how to extend a certain class of filtering algo- 
rithms to handle optional activities. The idea is simple: if the original algorithm 
uses 0-tree, we will use 0-A-tree instead. The difference is that we represent 
optional activities by gray nodes. For propagation we still use ECTe, however 
we can check ECT(0, A) also. If propagation using ECT(0, A) would result in 
an immediate fail we can exclude the optional activity responsible for that. 

Let us demonstrate this idea on the detectable precedences algorithm: 

( 07 A) := 0; 

Q : = queue of all activities j GT in ascending order of let , — p • ; 
for i e T in ascending order of est* + Pj do begin 
while estj + Pi > lctQ. first - p Q . flrst do begin 
if i is a regular activity then 
0 := 0 U {Q. first} ; 
else 

A := A U {Q.hrst} ; 

Q . dequeue ; 
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end ; 

est' := maxjestj, ECTg>\rq}; 
if i is a regular activity then 

while ECT (0 \ {?'} , A) + p i > lctj then begin 

k : = an optional activity responsible for ECT (0 \ { 1} , A ) ; 

A := M\{fc}; 
existence fc := false; 
end ; 

end ; 

for i€f do 

esti : = est' ; 

The complexity of the algorithm remains the same: 0(n log n). 

The same idea can be used to extend the not-first/not-last algorithm pre- 
sented in [10]. However, extending the edge-finding algorithm is not so easy: 
edge-finding algorithm already uses 0-A-tree. We will consider this in our future 
work. 



6 Experimental Results 

We tested the new edge-finding algorithm on several benchmark jobslrop prob- 
lems taken from OR library [1]. The benchmark problem is to compute a de- 
structive lower bound using the shaving technique. Destructive lower bound is 
the minimal makespan for which propagation is not able to find conflict without 
backtracking. Because destructive lower bound is computed too quickly, we use 
also shaving as suggested in [7] . Shaving is similar to the proof by a contradiction. 
We choose an activity *, limit its esU or lcti and propagate. If an infeasibility 
is found, then the limitation was invalid and so we can decrease lcti or increase 
estj. Binary search is used to find the best shave. To limit CPU time, shaving 
was used for each activity only once. 

Table 2 shows the results. We measured the CPU 1 time needed to prove 
the lower bound, i.e. the propagation is done twice: with the upper bound LB 
and LB-1. Times T1-T3 show running time for different implementations of the 
edge-finding algorithm: T1 is the new algorithm, T2 is the algorithm [7] and 
T3 is the algorithm [8]. As can be seen, the new algorithm is quite competitive 
for n = 10 and n = 15, for n > 20 it is faster than the other two edge-finding 
algorithms. 

Optional activities were tested on modified 10x10 jobslrop instances. In each 
job, activities on 5th and 6th place were taken as alternatives. Therefore in 
each problem there are 20 optional activities and 80 regular activities. Table 3 
shows the results. Column LB is the destructive lower bound computed without 
shaving, column Opt is the optimal makespan. Column CH is the number of 
clroicepoints needed to find the optimal solution and prove the optimality (i.e. 

1 Benchmarks were performed on Intel Pentium Centrino 1300MHz. 
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Table 2. Destructive Lower Bounds. 



Prob. 




Size 


LB 


T1 


T2 


T3 


abz5 


10 


X 


10 


1196 


1.430 


1.421 


1.466 


abz6 


10 


X 


10 


941 


1.773 


1.762 


1.815 


orbOl 


10 


X 


10 


1017 


1.773 


1.783 


1.841 


orb02 


10 


X 


10 


869 


1.491 


1.486 


1.529 


ftlO 


10 


X 


10 


911 


1.616 


1.618 


1.669 


la21 


15 


X 


10 


1033 


0.752 


0.784 


0.815 


la22 


15 


X 


10 


925 


3.486 


3.597 


3.763 


la36 


15 


X 


15 


1267 


5.376 


5.520 


5.768 


la37 


15 


X 


15 


1397 


2.498 


2.572 


2.667 


taOl 


15 


X 


15 


1224 


9.113 


9.304 


9.652 


ta02 


15 


X 


15 


1210 


7.097 


7.264 


7.586 


la26 


20 


X 


10 


1218 


0.749 


0.838 


0.899 


la27 


20 


X 


10 


1235 


0.908 


0.994 


1.054 


la29 


20 


X 


10 


1119 


3.357 


3.609 


3.816 


abz7 


20 


X 


15 


651 


3.283 


3.446 


3.579 


abz8 


20 


X 


15 


621 


12.00 


12.54 


13.14 


tall 


20 


X 


15 


1295 


14.72 


15.31 


16.03 


tal2 


20 


X 


15 


1336 


17.54 


18.30 


19.26 


ta21 


20 


X 


20 


1546 


38.43 


39.79 


41.90 


ta22 


20 


X 


20 


1501 


25.47 


26.25 


27.37 


ynl 


20 


X 


20 


816 


26.79 


27.58 


28.91 


yn2 


20 


X 


20 


842 


22.86 


23.59 


24.69 


ta31 


30 


X 


15 


1764 


4.788 


5.485 


5.936 


ta32 


30 


X 


15 


1774 


6.515 


7.390 


7.946 


swvll 


50 


X 


10 


2983 


15.70 


19.70 


21.62 


swvl2 


50 


X 


10 


2972 


19.21 


23.43 


25.23 


ta51 


50 


X 


15 


2760 


11.68 


14.58 


15.88 


ta52 


50 


X 


15 


2756 


12.07 


15.04 


16.32 


ta71 


100 


X 


20 


5464 


131.6 


173.6 


189.3 


ta72 


100 


X 


20 


5181 


132.0 


174.8 


190.8 



optimal makespan used as the initial upper bound). Finally the column T is the 
CPU time in seconds. 

As can be seen in the table, propagation is strong, all of the problems were 
solved surprisingly quickly. However more test should be made, especially on 
real life problem instances. 
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Table 3. Alternative activities. 



Prob. 




Size 


LB 


Opt 


CH 


T 


abz5-alt 


10 


X 


10 


1031 


1093 


283 


0.336 


abz6-alt 


10 


X 


10 


791 


822 


17 


0.026 


orbOl-alt 


10 


X 


10 


894 


947 


9784 


12.776 


orb02-alt 


10 


X 


10 


708 


747 


284 


0.328 


ft 10- alt 


10 


X 


10 


780 


839 


4814 


6.298 


la 16- alt 


10 


X 


10 


838 


842 


27 


0.022 


lal7-alt 


10 


X 


10 


673 


676 


24 


0.021 


la 18- alt 


10 


X 


10 


743 


750 


179 


0.200 


la 19- alt 


10 


X 


10 


686 


731 


84 


0.103 


la20-alt 


10 


X 


10 


809 


809 


14 


0.014 
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Appendix 

A Equivalence of the Edge-Finding Rules 

Let us consider an arbitrary set 17 C T. Overload rule says that if the set 17 
cannot be processed within its time bounds then no solution exists: 

lctj? -estn < => fail (7) 

Note that it is useless to continue filtering when a fail was fired. Therefore 
in the following we will assume that the resource is not overloaded. 

Proposition 2. The rule (4) is not stronger than the original rules (2) and (3). 

Proof. Let us consider a pair of activities i, j for which the new rule (4) holds. 
We define a set 17' as a subset of 0(j) U {*} for which: 

ECTe>(j)u{j} = est^/ + p^, (8) 

Note that thanks to the definition (1) of ECT such a set 17' must exist. 

If i 17' then 17' C 0(j), therefore 

(8) (4) 

est^'+p^/ = ECTe^upj > let, > let ft/ 

So the resource is overloaded (see the overload rule (7)) and fail should have 
already been fired. 

Thus i £ 17'. Let us define 17 = 17' \ {*}. We will assume that 17 yf 0, because 
otherwise est, > ECTg>p) and rule (4) changes nothing. For this set 17 we have: 

(8) (4) 

min {estfi, estj + Pq + P* = est ^' + Pfi' = ECT e(j)u{i} > 1<% > lct fi 

Hence the rule (2) holds for the set 17. To complete the proof we have to show 
that both rules (3) and (4) adjust esti equivalently, i.e. ECTjy = ECTg>p). We 
already know that ECT^ < ECTgip) because 17 C 0(j). Suppose now for a 
contradiction that 

ECT fi < ECT e(i) (9) 

Let $ be a set d> C 0(j) such that: 

ECTep) = est^ + p^ (10) 

Therefore: 

(9) fio) 

cstjy+pjy — ECTiy < ECTgip) = est<j> + p^ (11) 

Because the set 17' = I7u{i} defines the value of ECTe^up} (i.e. estj?/ + pjy/ = 
ECTgipjupj.), it has the following property (see the definition (1) of ECT): 



Vk £ 0(j) U {i} : estfc > estjy/ => k £ 17' 
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And because fi = fi' \ {?'}: 

Vfc € <9(j) : est k > estft/ => k € fi (12) 

Similarly, the set <I> defines the value of ECTq^: 

Vfc e <9(j) : est k > est<i> => k € ^ (13) 

Combining properties (12) and (13) together we have that either fi C <P (if 
est ft/ > est<j>) or C fi (if est ft/ < est#). However, <P C fi is not possible, 
because in this case est^ + p^ < ECTft what contradicts the inequality (11). 
The result is that fi C and so p fi < p^>. 

Now we are ready to prove the contradiction: 

(g) 

ECT<9 (j -) U {j} = estft/+p fi , 

= min {estft, estj} + Pft + p, because fi = fl' \ {/«} 

= min {estft + Pft + P;, est* + p fi + pj 

< min {est# + p# + p, : , est* + p# + pj by (11) and p fi < p# 

< ECTgiQ^p} because <f> C 0(j) 

□ 
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Abstract. Refutation proofs can be viewed as a special case of constraint propa- 
gation, which is a fundamental technique in solving constraint-satisfaction prob- 
lems. The generalization lifts, in a uniform way, the concept of refutation from 
Boolean satisfiability problems to general constraint-satisfaction problems. On 
the one hand, this enables us to study and characterize basic concepts, such as 
refutation width, using tools from finite-model theory. On the other hand, this 
enables us to introduce new proof systems, based on representation classes, that 
have not been considered up to this point. We consider ordered binary decision 
diagrams (OBDDs) as a case study of a representation class for refutations, and 
compare their strength to well-known proof systems, such as resolution, the Gaus- 
sian calculus, cutting planes, and Frege systems of bounded alternation-depth. In 
particular, we show that refutations by ODBBs polynomially simulate resolution 
and can be exponentially stronger. 



1 Introduction 

It is well known that the satisfiability problem for Boolean formulas in conjunctive 
normal form (CNF) can be viewed as a constraint-satisfaction problem (CSP). The in- 
put to a CSP consists of a set of variables, a set of possible values for the variables, 
and a set of constaints on the variables. The question is to determine whether there is 
an assignment of values to the variables that satisfies the given constraints. The study 
of CSP occupies a prominent place in artificial intelligence and computer science, be- 
cause many algorithmic problems from a wide spectrum of areas can be modeled as 
such [Dec03]. These areas include temporal reasoning, belief maintenance, machine 
vision, scheduling, graph theory, and, of course, propositional logic. Since constraint- 
satisfaction problems constitute a natural generalization of Boolean satisfiability prob- 
lems, it is natural to ask for proof systems that generalize the systems for propositional 
logic to CSP. Such systems would be used to refute the satisfiability of an instance of a 
constraint-satisfaction problem, much in the same way that resolution is used to refute 
the satisfiability of a CNF-formula. 

* Supported in part by CICYT TIC2001-1577-C03-02 and the Future and Emerging Technolo- 
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One of the goals of this paper is to introduce a natural and canonical way of defin- 
ing a proof system for every constraint-satisfaction problem. In order to achieve this, 
first we need a unifying framework for representing such problems. This was achieved 
by Feder and Vardi [FV98], who recognized that essentially all examples of CSPs in 
the literature can be recast as the following fundamental algebraic problem, called the 
Homomorphism Problem: given two finite relational structures A and B, is there 
a homomorphism h : A — B? Intuitively, the structure A represents the variables 
and the tuples of variables that participate in constraints, the structure B represents the 
domain of values, and the tuples of values that these constrained tuples of variables are 
allowed to take, and the homomorphisms from A to B are precisely the assignments 
of values to variables that satisfy the constraints. For instance, the 3-COLORABILITY 
problem coincides with the problem of deciding whether there is a homomorphism from 
a given graph G to K 3 , where K 3 is the complete graph with three nodes (the triangle). 
The uniform version of the HOMOMORPHISM PROBLEM, in which both structures A 
and B are given as input, is the most general formulation of the constraint-satisfaction 
problem. Interesting algorithmic problems, however, also arise by fixing the structure 
B, which sometimes is called the template structure. Thus, the resulting problem, de- 
noted by CSP(B), asks: given A, is there a homomorphism from A to B? Note that 
CSP(K 3 ) is precisely the 3-COLORABILITY problem; more generally, CSP(Kfc) is 
the /c-Color ABILITY problem, where K/ i; is the complete graph with /c-nodes, k > 2. 

With constraint-satisfaction problems presented as homomorphism problems in a 
unfying way, we are closer to our first goal of defining canonical proof systems. The 
approach we take is via yet another interpretation of CSPs, this time in terms of database 
theory, building upon the homomorphism framework. As pointed out in [GJC94], every 
constraint can be thought of as a table of a relational database, and the set of solutions 
to a CSP can be identified with the tuples in the join of all constraints. This fruitful con- 
nection between CSPs and database theory is explored further in [KVOOa], Now, a CSP 
instance is unsatisfiable precisely when the join of the constraints is empty. We adopt 
this approach and define a CSP(B) refutation of an instance A to be a sequence of 
constraints ending with the empty constraint, such that every constraint in the sequence 
is an initial constraint, the join of two previous constraints, the projection of some previ- 
ous constraint, or the weakening of some previous constraint. Projection and weakening 
are not strictly necessary, but provide a versatile tool for reducing the complexity of the 
intermediate constraints. Note that the join is a form of constraint propagation, since 
it allows us to derive new constraints implied by the previous ones. See the work by 
Freuder [Fre78] for the first theoretical approach to constraint propagation. 

The proof systems obtained this way are sound and complete for constraint satisfac- 
tion. We embark on the investigation of their general properties by focussing first on the 
concept of refutation width , which is the maximum arity of the constraints in a refuta- 
tion. Bounding the arity of the constraints generated during the execution of constraint 
propagation algorithms has already played a crucial role in the development of the the- 
ory of CSPs, as a method to achieve tractability [Fre82,Fre90,DP87]. For example, 
various concepts of consistency popularized by the AI community rely on it [Dec03]. 
Following the ideas in [FV98, KVOOa, AD03], we are able to show that the minimal 
refutation width of a CSP(B) instance A is characterized by a combinatorial game in- 




Constraint Propagation as a Proof System 



79 



troduced in the context of finite-model theory. In turn, again following [FV98,KV00a], 
this leads us naturally to considering the treewidth of the instance as a parameter. As a 
result, we obtain a deeper understanding and also a purely combinatorial characteriza- 
tion of refutation width. 

CSP refutations are perhaps too general to be of practical use. The rules are too gen- 
eral and the constraints, if represented explicitly, may be too large. Hence, we propose a 
syntactic counterpart to general CSP refutations, in which all the constraints are some- 
how succintly represented. Technically speaking, we consider representation classes for 
the constraints. Some examples include clauses, linear equalities over a finite field, lin- 
ear inequalities over the integers, decision trees, decision diagrams, and so on. With this 
new formalism, CSP proofs become purely syntactical objects, closer to their counter- 
parts in propositional logic. As a case study, we investigate the proof system obtained 
by using ordered binary decision diagrams (OBDDs) as our representation class for 
constraints. OBDDs possess many desirable algorithmic properties and have been used 
successfully in many areas, most notably in formal verification (see [Bry92,BCM + 92]). 
We compare the strength of refutations by OBDDs with other proof systems for propo- 
sitional logic. We show that OBDD-based refutations polynomially simulate both reso- 
lution and the Gaussian calculus; moreover, they are exponentially stronger than either 
of these systems, even when the weakening rule is not allowed. If we make strong use 
of weakening, then refutations by OBDDs can polynomially simulate the cutting planes 
proof system with coefficients written in unary (called CP* in [BPR97]). In partic- 
ular, OBBDs provide polynomial-size proofs of the pigeonhole principle. This shows 
already that refutations by OBDDs can be exponentially stronger than resolution, and 
even Frege (Hilbert- style) systems with formulas of bounded alternation-depth, because 
the pigeonhole principle is hard for them [Hak85,Ajt88,BIK + 92]. Finally, we observe 
that for a particular order of the variables, refutations by OBDDs have small commu- 
nication complexity. By combining this with known techniques about feasible interpo- 
lation [IPU94,Kra97], we establish that OBDD-based refutations have polynomial-size 
monotone interpolants, for a particular order of the variables. This gives exponential 
lower bounds for a number of examples, including the clique-coloring principle, still 
for that particular order. Whether the restriction on the order is necessary remains an 
interesting open problem. 



2 Preliminaries 

Constraint-Satisfaction Problems. A relational vocabulary a is a collection of rela- 
tion symbols R, each of a specified arity. A er-structure A consists of a universe A, or 
domain , and for each R £ a, an interpretation Il A C A r , where r is the arity of It. 

Let B be a finite (T-structure. We denote by CSP(B) the class of all finite cr- 
structures A such that there is a homomorphism from A to B. Recall that a homo- 
morphism is a mapping from the universe of A to the universe of B that preserves the 
relations. As mentioned in the introduction, each CSP(B) is a constraint-satisfaction 
problem. The structure B is called the template structure. Let us discuss how 3-SAT 
can be modeled by a particular CSP(B). This will be of help later in the paper. The 
relational vocabulary consists of four ternary relation symbols {Rq, R±, R 2 , R 3 } rep- 
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resenting all possible types of 3-clauses: clauses with no negations, clauses with one 
negation, clauses with two negations, and clauses with three negations. The template 
structure T has the truth tables of these types of clauses: R ^ = {0, l} 3 — {000}, 
Rj = {0, l} 3 - {100}, Rj = {0, l} 3 - {110}, and Rj = {0, l} 3 - {111}. Every 
3-CNF formula p gives rise to a cr-structure with universe the set of variables of p 
and relations encoding the clauses of p; for instance, R 1 * consists of all triples (x, y , z) 
of variables of p such that (->x V y V z) is one of the clauses of p. Thus, CSP(T) is 
equivalent to 3-SAT, since p is satisfiable if and only if there is a homomorphism from 
A v to T. 

Pebble Games. The existential k-pebble games were defined in [KV95,KV00a]. The 
games are played between two players, the Spoiler and the Duplicator, on two o- 
structures A and B according to the following rules. Each player has a set of k pebbles 
numbered {1, . . . , k}. In each round of the game, the Spoiler can make one of two dif- 
ferent moves: either he places a free pebble on an element of the domain of A, or he 
removes a pebble from a pebbled element of A. To each move of the Spoiler, the Du- 
plicator must respond by placing her corresponding pebble over an element of B, or 
removing her corresponding pebble from B, respectively. If the Spoiler reaches a round 
in which the set of pairs of pebbled elements is not a partial homomorphism between 
A and B, then he wins the game. Otherwise, we say that the Duplicator wins the game. 
The formal definition can be found in [KV95,KV00a], and the close relationship be- 
tween existential pebble games and constraint-satisfaction problems was discussed at 
length in [KVOOb]. 

Treewidth. The treewidth of a graph can be defined in many different ways [Bod98]. 
One way is this. The treewidth of a graph G is the smallest positive integer k such that 
G is a subgraph of a fc-tree, where a fc-tree is defined inductively as follows: the k + 1- 
clique K/ ;;+ i is a fc-tree, and if G is a fc-tree, then the result of adding a new node to G 
that is adjacent to exactly the nodes of a fc-clique of G (thus forming a (k+l)-clique) 
is also a fc-tree. The Gaifman graph of a structure A is the graph whose set of nodes is 
the universe of A, and whose edges relate pairs of elements that appear in some tuple 
of a relation of A. The treewidth of a structure is the treewidth of its Gaifman graph. 



3 Proof Systems for CSPs 

Notions from Database Theory. A relation schema R(x i, . . . , Xk) consists of a rela- 
tion name R, and a set of attribute names x \, . . . , Xfc. A database schema a is a set of 
relation schemas. A relation conforming with a relation schema R{x i, . . . , Xk) is a set 
of fc-tuples. A database over a database schema cr is a set of relations conforming with 
the relation schemas in cr. In other words, a database over a is a cr-structure, except that 
the universe of the structure is not made explicit. In the sequel, we often conflate the 
notation and use the same symbol for both a relation schema and a relation conforming 
with that schema. 

We use x to denote a tuple of attribute names (xi, . . . , Xp-) and also to denote the 
set {xi, . . . , Xk}- It will be clear from context which case it is. Let R be a relation 
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conforming with the relational schema i?(x). Let y C x be a subset of the set of 
attribute names. The projection of R with respect to y is the relation whose attribute 
names are y, and whose tuples can be extended to tuples in R. We denote it n y (R). 
Let R and S be relations conforming with relational schemas /i'fxj and ,S’(yj. The 
relational join of R and S, or simply join, is the largest relation T whose attribute 
names are x U y, and such that tr x (T) C R and tr y {T) C S. We denote it by R M S. 
Joins are commutative and associative, and can be extended to an arbitrary number of 
relations. 

Notions from CSPs. Let cr be a relational vocabulary. Let A and B be two cr-structures. 
A fc-ary constraint is a pair (x, R), where x is a /c-tuple of distinct elements of the uni- 
verse of A, and R is a fc-ary relation over the universe of B. The constraint (x, R) can 
be interpreted as a pair formed by a relation schema J?(x) and a relation R conforming 
with it. Here, x is the set of attribute names. Thus, it makes sense to talk about joins and 
projections of constraints. We say that a constraint (x, R) is a superset, or weakening, 
of another constraint (y, S) if x = y and R D S . 

If there is a homomorphism from A to B, then we say that the instance A of 
CSP(B) is satisfiable', otherwise, we say that it is unsatisfiable . Recall from Section 2 
that these definitions generalize Boolean satisfiability and unsatisfiability of 3-CNF for- 
mulas. If a CSP instance is unsatisfiable, its satisfiability may be refuted. We are inter- 
ested in refutations by means of joins, projections, and weakening. Here, constraints 
(x, R) are viewed as relational schemas J?(x) with a relation R conforming with it as 
suggested in the preceding paragraph. 

Definition 1 (CSP Refutation). Let A and B be cr-structures. A CSP(B) proof from 
A is a finite sequence of constraints (x,f?) each of which is of one of the following 
forms: 

1. Axiom: (x, R B ), where R £ o and x £ R A 

2. Join: (x U y, R XI S), where (x, R) and (y, S) are previous constraints. 

3. Projection: (x — {x}, Tt x -{ x }(R))> where (x, R) is a previous constraint. 

4. Weakening: (x, S), where (x, R) is a previous constraint and R C S. 

A CSP(B) refutation of A is a proof whose last constraint has an empty relation. 

Note that the projections eliminate one variable at a time. We say that the variable 
is projected out. The following simple result states that CSP refutations form a sound 
and complete method for proving that a given instance of a CSP is unsatisfiable. The 
fact that CSP can be reduced to a join of constraints is mentioned already in [GJC94], 

Theorem 1 (Soundness and Completeness). Let A and B be cr-structures. Then. A 
has a CSP(B) refutation if and only if A is unsatisfiable in CSP(B). In fact, axioms 
and joins alone are already enough to refute an unsatisfiable instance. 

Due to space limitations, we need to omit most proofs in this version of the paper. 
The proof of Theorem 1 shows that refutations need not be any longer than linear in the 
number of constraints of the CSP instance. However, the critical reader may observe 
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that the intermediate constraints may be arbitrarily complex. On the other hand, the 
rules of projection and weakening can be used to lower this complexity when necessary. 
It will become clear later on how this is of any help in applications. At this point, let us 
introduce a formalism to measure the complexity of the intermediate constraints. A k- 
ary constraint (x, R) can be identified with a Boolean-valued function / : B k —> {0, 1} 
by letting /(a) = 1 if and only if a £ R (in other words, this is the characteristic 
function of the relation R). Now, functions of this sort can be represented in various 
ways by means of representation classes. 

Definition 2 (Representation Class). Let B be a finite set. A representation class for 
Boolean-valued functions with domain B k is a triple 1Z = (Q,I, S), where Q is a 
set, called the set of representations, I is a mapping from Q to the set of functions 
f : B k — > {0, 1} called the interpretation, and S is a mapping from Q to the integers, 
called the size function. 

To be useful for CSP refutations, representation classes should satisfy certain regu- 
larity conditions, such as being closed under joins and projections. In addition, the size 
function should capture the intuitive notion of complexity of a representation. There are 
many examples of representation classes in the literature, particularly when the domain 
B is Boolean, that is, B = {0, 1}. 

Examples. Let B = {0,1}, and let A = {x \, . . . , x n } be a set of propositional vari- 
ables. Clauses over A form a representation class. The interpretation of a clause is the 
obvious one, and we may define the size of a clause by the number of literals in it. A 
clause C can be thought of as a constraint (x, R), where x is the set of variables in 
C (not literals), and R is the set of truth assignments to the variables that satisfy the 
clause. Unfortunately, clauses are not closed under joins, that is, the join of two clauses 
is not necessarily a clause. Nonetheless, clauses are closed under the resolution rule, 
which can be seen as a combination of one join and one projection (see also [DvB97]). 
Indeed, if CVx and D V ~<x are clauses, then the resolvent clause C V D is precisely the 
result of projecting x out of their join. We exploit and elaborate on this connection with 
resolution in Section 5. Binary decision diagrams (BDDs), a.k.a. branching programs 
(BPs), also form a representation class (see section 5 for a reminder of the definitions). 
The interpretation of a BDD is the obvious one, and we may define its size by the num- 
ber of nodes of its graph. BDDs are closed under joins and projections. In fact, BDDs 
are closed under all operations, since BDDs can represent all Boolean functions. More- 
over, when an order on the variables is imposed, the representation of the join can be 
obtained in polynomial time. We will discuss these issues in Section 5. Linear inequal- 
ities ■ ctiXi < a o, for integers a.,, also form a representation class. The interpretation 
of Ei apXi < ao is a fc-ary Boolean-valued function / : B k — » {0, 1}, where k is the 
number of variables, defined by f(bi, . . . , bk) = 1 if and only if ]TL af>i < ao- The 
size of a linear inequality may be defined by the number of bits needed to represent 
the k + 1 coefficients, or by ao + JT a* if the coefficients are represented in unary. As 
was the case with clauses, linear inequalities are not closed under joins. Representation 
classes can also be used to represent functions / : B k — ► {0, 1} with non-Boolean 
domain B. As long as B is finite, BDDs form an appropriate example. The particular 
case of (non-binary) decision trees is also a good example. 
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The notion of a representation class suggests a syntactic counterpart of the general 
notion of CSP refutation. Moreover, it also suggests a way to bound the complexity of 
the intermediate relations in a CSP refutation. Recall that the width of constraint (x, R) 
is the same as its arity, that is, the length of the tuple x. 

Definition 3 (Complexity Measures). Let B be a a -structure. Let 1Z = (Q. /, S) be a 
representation class for Boolean-valued functions on the universe of Li. Let C 1 , . . . , C m 
be a CSP(B) proof, and let ri £ Q be a representation of the constraint Ci. We say 
that ri, . . . , r m is an TZ-proof Its length is m, its size is S(rf) + • • • + S(r m ), and its 
width is the maximum width ofCi,..., C m . 

It was mentioned already that a representation class should satisfy certain regularity 
conditions. The actual conditions depend on the application at hand. One particularly 
useful property is that the representation of a join (projection, weakening) be com- 
putable in polynomial time from the representations of the given constraints. In our 
intended applications, this will indeed be the case. 



4 Refutation Width and Treewidth 

Characterization of Refutation Width. Width has played a crucial role in the devel- 
opment of the theory of CSPs [DP87]. Part of the interest comes from the fact that a 
width upper bound translates, for most representations, to a size bound on individual 
constraints. This is true, for example, for explicit representation and for BDDs. In the 
proof complexity literature, Ben-Sasson and Wigderson [BSW01 ] viewed it as a com- 
plexity measure for resolution. Here, we adopt the methods for CSP refutations. 

Theorem 2. Let A and B be two finite a-structures. The following are equivalent: 

1. A has a CSP(B) refutation of width k. 

2. The Spoiler wins the existential k-pebble game on A and B. 

An intimate connection between pebble games and the notion of strong consistency 
[Dec92] was established in [KVOOb]. This entails an intimate connection between the 
concepts of refutation width and the concept of strong consistency. Specifically, it fol- 
lows from the results in [KVOOb] and the above theorem that A has a CSP(B) refuta- 
tion of width k precisely when it is impossible to establish strong /^-consistency for A 
and B. 

Next we turn to studying the effect of the treewidth of the instance A on the width 
of the CSP refutations. We will need the following result due to Dalmau, Kolaitis and 
Vardi: 

Theorem 3 ([DKV02]). Let k > 2, let A be a finite a-structure of treewidth less than 
k, and let B be a finite cr-structure. Then the following statements are equivalent: 

1. There is a homomorphism from A to B. 

2. The Duplicator wins the existential k-pebble game on A and B 
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It is immediate from Theorems 2 and 3 that if A is unsatisfiable in CSP(B) and has 
treewidth less than fc, then A has a CSP(B) refutation of width k. In fact, this result 
remains true in a more general situation. A substructure C of A is called a core of A if 
there is a homomorphism from A to C, but, for every proper substructure C' of C, there 
is no homomorphism from A to C'. It is known [HN92] that every finite structure A has 
a unique core up to isomorphism, denoted by core(A), and that A is homomorphically 
equivalent to core(A). In the context of database theory, the treewidth of the core of 
A captures exactly the smallest number k such that the canonical conjunctive query 
Q a can be expressed in the existential positive fragment of first-order logic with k 
variables [DKV02, Theorem 12], Now, back to refutations, if A is an unsatisfiable 
instance of CSP(B) and the core of A has treewidth less than k, then A also has a 
CSP(B) refutation of width k. Indeed, if A is an unsatisfiable instance of CSP(B), 
so is core(A) because they are homomorphically equivalent; moreover, if core(A) has 
treewidth less than fc, then core(A) has a CSP(B) refutation of width less than k. Since 
core(A) is a substructure of A, a CSP(B) refutation of core(A) is also a CSP(B) 
refutation of A. 

One may wonder whether the converse is true. Is the treewidth of the core of A 
capturing the width of the refutations of A? Unfortunately, the answer turns out to 
be negative for rather trivial reasons. Take a B such that CSP(B) can be solved by 
a A: -Data log program for some fixed k. For example, let B = K 2 so that CSP(B) 
becomes 2-Colorability, which is expressible in 3-Datalog. Take a graph G which 
is not 2-colorable. Hence, the Spoiler wins the existential 3-pebble game on G and K 2 
[KVOOa], Now just add an arbitrarily large clique to G, that is, let G' = GU K/, : for 
some large k. There still exists a CSP(B) refutation of G' of width 3, but the core of 
G' has treewidth at least k — 1. This counterexample, however, suggests that something 
more interesting is going on concerning the relationship between existential fc-pebble 
games and treewidth k. 

Theorem 4 . Let k > 2, let A and B be two finite a-structures. Then the following 
statements are equivalent: 

1. The Duplicator wins the existential k-pebble game on A and B. 

2. If A' is a structure of treewidth less than k and such that there is a homomorphism 

from A' to A, then the Duplicator wins the existential k-pebble game on A' and 

B. 

Proof sketch: (i) => (ii) is easy, (ii) => (i). Let Pb be the fc-Datalog program that ex- 
presses the query: “Given A, does the Spoiler win the existential fc-pebble game on A 
and B?” [KVOOa]. Assume that the Spoiler wins the existential fc-pebble game on A 
and B. Hence A satisfies Pb, hence it satisfies one of the stages of the fc-Datalog pro- 
gram Pb - Each such stage is definable by a union of conjunctive queries, each of which 
can be written in the existential positive fragment of first-order logic with fc variables. 
Hence A satisfies Q A , where A' is a structure of treewidth less than fc. Hence, there is 
a homomorphism h from A' to A. But also A' satisfies Pb, hence the Spoiler wins the 
existential fc-pebble game on A' and B. □ 

Now we combine Theorems 2, 3 and 4 to obtain a purely combinatorial characteri- 
zation of when a structure has a CSP refutation of a certain width. 
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Corollary 1. Let k > 2, let A and B be two finite a -structures. The following are 
equivalent: 

1. A has a CSP(B) refutation of width k. 

2. There exists a structure A' of treewidth less than k and such that there is a homo- 
morphism from A 1 to A, and A' is unsatisfiable in CSP(B). 

Note that the characterization of refutation width is stated in terms of treewidth and 
homomorphisms and does not mention refutations at all. Let us add that the structure 
in (2) can be large, so Corollary 1 does not yield any complexity bound for deciding 
whether A has a CSP(B) refutation of width k. As it turns out, it follows from Theo- 
rem 2 and the result in [KP03], that this problem is EXPTIME-complete. 

Small- Width Proof-Search Algorithms. Next we study the complexity of finding a 
satisfying assignment, or refuting the satisfiability, of an instance A of CSP(B) when 
we parameterize by the treewidth k of A. The decision problem has been studied before 
in certain particular cases. When k is bounded by a constant, the problem can be solved 
in polynomial time [DP87,Fre90]. When B is a fixed structure, Courcelle’s Theorem 
[Cou90] implies that the problem can be solved in time 2 °^n, where n is the size of 
A. Indeed, if B is fixed, then satisfiability in CSP(B) can be expressed in monadic 
second-order logic, so Courcelle’s Theorem applies. We consider the case in which 
B and k are not fixed, and also the problem of finding a satisfying assignment, or 
a refutation. In the particular case of Boolean B and resolution refutations, a related 
problem was studied in [AR02] where branchwidth was used instead of treewidth. Our 
proof is more general, rather different, and perhaps simpler. 

Theorem 5. The problem of determining whether a structure A of treewidth k is satis- 
fiable in CSP(B) can be solved by a deterministic algorithm in time 2 °( fc ) TO C , ( fc ) n O( 1 ) j 
where n is the size of A and m is the size of B. In particular, the algorithm runs in 
polynomial time when k = 0(logn/logm). Moreover, if A is satisfiable, the algo- 
rithm produces a homomorphism h : A — > B, and if A is unsatisfiable, it produces a 
CSP(B) refutation of width k. 

Proof sketch. The idea is to build an existential positive sentence ip, with k variables, 
that is a rewriting of the canonical query Q A . This takes time polynomial in the tree- 
decomposition of A, which can be found in time Then we evaluate ip on 

B bottom up, from inner subformulas to the root. Since each subformula involves at 
most k variables, this takes time times the size of the formula, which is time 

2 °( fe )rn°( fc )n°( 1 ) overall. Since ip = Q A , we have that B satisfies ip if and only if 
there exists a homomorphism from A to B. □ 



5 Refutations by OBDDs: A Case Study 

Regularity Properties of OBDDs. In this section we study the effect of using ordered 
binaiy decision diagrams as a representation class for constraints. We focus on the 
Boolean case B = {0, 1}. 
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For the history on the origins of binary decision diagrams, branching programs, and 
ordered binary decision diagrams we refer the reader to the survey by Bryant [Bry92]. 
Here are the definitions. Let x\ , . . . , x n be n propositional variables. A binaiy deci- 
sion diagram (BDD), or branching program (BP), represents a Boolean function as a 
rooted, directed acyclic graph G. Each non-terminal node u of G is labeled by a vari- 
able v(u ) € { x \ , . . . , x n }, and has arcs toward two children t(u) and /(it), referred to 
as the true and the false children respectively. Each terminal node is labeled 0 or 1. For 
a truth assigment to the variables x \, . . . , x n , the value of the function is determined by 
following the path through the directed graph, from the root to a terminal node, accord- 
ing to the labels of the nodes and the values to the variables. The size of a BDD is the 
size of the underlying graph G. An ordered binary decision diagram (OBDD) is a BDD 
in which labeled paths are consistent with a specific total order < over the variables. 
More precisely, for an OBDD we require that the variable labeling a non-terminal node 
be smaller than the variables labeling its non-terminal children, according to a fixed 
order over the variables. 

The main property of OBDDs is that, in their reduced form, they are canonical , 
meaning that for a given order, two OBDDs for the same function are isomorphic. An 
immediate consequence is that testing for equivalence of two OBDDs can be solved in 
time polynomial in their size. Most interesting for us is the fact that representations of 
joins and projections are computable in polynomial time, and determining whether an 
OBDD is a weakening of another is decidable in polynomial time. 

It follows from this that given a CSP refutation C j , . . . , C m with the constraints 
represented by OBDDs, the validity of applications of the join rule, the projection rule, 
and the weakening rule, can be checked in polynomial time. Therefore, refutations by 
OBDDs when applied to 3-SAT (a particular CSP(B), see below) form a proof system 
in the sense of Cook and Reckhow [CR79]. 

Strength of Refutations by OBDDs. Let us compare the size of CSP refutations by 
OBDDs with other well-known proof systems for propositional logic. Recall from Sec- 
tion 2 how 3-SAT is represented as a CSP(B) problem. The template structure is T 
and its vocabulary consists of four ternary relations {Rq, Ri, R 2 , R 3 }, one for each 
type of 3-clause. Thus, structures for this vocabulary are 3-CNF formulas. A refutation 
by OBDDs of a 3-CNF formula A is a refutation of A in CSP(T) when constraints are 
represented by OBDDs for a fixed total order of the variables. Size, length and width of 
refutations by OBDDs are defined according to Definition 3 in Section 3. 

Resolution. The resolution rule is very simple: from CV:t and D V ~<x, derive C V D, 
where C and D are clauses in which x does not occur. The goal is to derive the empty 
clause from a given set of initial clauses. The length of a resolution refutation is the 
number of clauses that are used in it. The size of a resolution refutation is the total 
number of literals that appear in it. There are two key observations that concern us 
here. The first is that every clause has a small equivalent OBDD over all orders of the 
variables. The second observation is that C V D can be expressed in terms of one join 
and one projection from C V x and D V ~<x (see also [DvB97]). We use both facts for 
the following result whose proof will be included in the full paper. 
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Theorem 6. Let A be a 3 -CNF formula on n variables. If A has a resolution refutation 
of length m, then A has a refutation by OBDDs of length 2 m and size 0(mn 2 ), even 
without using the weakening rule and for every order of the variables. Moreover, there 
is a polynomial-time algorithm that converts the resolution refutation into the refutation 
by OBDDs. 

We will see below that, in fact, refutations by OBDDs are exponentially stronger 
than resolution. As an intermediate step we move to a different CSP: systems of equa- 
tions over Z 2 . 

Gaussian calculus. One nice feature of OBDDs is that they give a uniform framework 
for defining all types of constraints. Consider now the CSP defined by systems of linear 
equations over the two-element field Z 2 , with exactly three variables per equation. That 
is, the vocabulary contains two ternary relation symbols R 0 and lit representing the 
equations x + y + z = 0 and x + y + z = 1 respectively. The template structure 
S contains the truth tables of these equations: that is Rq = {000, Oil, 110, 101} and 
Rf = {001,010,100,111}. Now CSP(S) coincides with systems of equations over 
Z 2 . The standard method for solving systems of equations is Gaussian elimination. In 
fact, Gaussian elimination can be used to refute the satisfiability of systems of equations 
by deriving, for example, 0 = 1 by means of linear combinations that cancel at least 
one variable. This has led to proposing the Gaussian calculus as a proof system [BSI99]. 
Let us see that refutations by OBDDs can polynomially simulate it. Perhaps the most 
interesting point of the proof is that we actually show that weakening is not required, 
which is not immediately obvious. 

Theorem 7. Let A be a system of equations over Z 2 with exactly three variables per 
equation. If A has a Gaussian calculus refutation of length m, then A has a refutation 
by OBDDs in CSP(S) of length 2m and size 0{mn 2 ), even without using the weak- 
ening rule and for every order of the variables. Moreover, there is a polynomial-time 
algorithm that converts the Gaussian calculus refutation into the refutation by OBDDs. 

We can now use this result to conclude that for 3-CNF formulas, refutations by 
OBDDs are exponentially stronger than resolution. Consider the standard translation of 
a linear equation x + y + z = a of Z 2 into a 3-CNF formula. Namely, for a = 1 the 
3-CNF formula is 

(x V y V z) A (x V ->y V ~>z) A (~<x V)/V ~<z) A (~<x V ~>y V z), 

and the formula for a = 0 is similar. For a system of equations over Z 2 with three 
variables per equation A, let T(A) be its translation to a 3-CNF formula. It is not hard 
to see that if A has a refutation by OBDDs in CSP(S) of length m, then T(A) has 
a refutation by OBDDs in CSP(T) of length 0(m). The idea is that the join of the 
OBDDs for the clauses defining an equation x + y + z = a is precisely an OBDD 
representing the equation x + y + z = a. Therefore, one refutation reduces to the other. 

The particular system of equations known as Tseitin contradictions [Tse68] is ex- 
ponentially hard for resolution. This was shown by Urquhart [Urq87] and was later 
extended by Ben-Sasson [BS02] who showed the same result for every Frege sys- 
tem (Hilbert- style system) restricted to formulas of bounded alternation-depth. This 
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establishes that refutations by OBDDs are exponentially stronger than resolution. For 
bounded alternation-depth Frege systems, it shows that in some cases refutations by 
OBDDs might be exponentially stronger. In the next section we see that refutations by 
OBDDs and bounded alternation-depth Frege systems are incomparable. 

Cutting planes. We now show that, in the presence of weakening, refutations by OBDDs 
polynomially simulate the cutting planes proof system with small coefficients. It is well- 
known that clauses can be expressed as linear inequalities over the integers. For example 
the clause x V ->y V z can be expressed by x + (1 — y) + 2 > 1, or equivalently, 
x — y + z > 0. Therefore, a CNF formula translates into a system of inequalities 
over the integers in a natural way. The cutting planes proof system was introduced in 
[CCT87]. The lines in the proof are linear inequalities over the integers. There are three 
rules of inference: addition, scalar multiplication, and integer division. The only rule the 
requires explanation is the integer division. From JN (c ■ CLi)Xi > ao derive JN aiXi > 
[do /c~|. Intuitively, if all coefficients except the independent term are divisible by c, 
then we may divide all over, and round-up the independent term. The rule is sound 
on the integers, meaning that if the Xi’s take integer values that satisfy the hypothesis, 
then the conclusion is also satisfied. The goal of the system is to derive a contradiction 
0 > 1 from a given set of linear inequalities. For refuting 3-CNF formulas, each clause 
is viewed as a linear inequality as described before. 

In order to measure the size of a proof we need to specify an encoding for the 
inequalities. When the coefficients are encoded in unary, the system has been named 
CP* and studied in [BPR97], We see that refutations by OBDDs can polynomially 
simulate CP*. As it turns out, the rule of weakening is strongly used here. Whether 
weakening is strictly necessary remains as an intriguing open problem. 

Theorem 8. Let A be a 3-CNF. If A has a CP* refutation of length m and size s, 
then A has a refutation by OBDDs of length 2m and size s 0 ^\ for every order of 
the variables. Moreover, there is a polynomial-time algorithm that converts the CP* 
refutation into the refutation by OBDDs. 

One consequence of this is that the pigeonhole principle, when encoded proposi- 
tionally as an unsatisfiable 3-CNF formula, admits polynomial- size OBDD refutations. 
This follows from the known polynomial-size proofs of the pigeonhole principle in 
cutting planes [CCT87]. In contrast, the pigeonhole principle requires exponential- size 
refutations in resolution [Hak85]. It would be good to find a direct construction of the 
polynomial-size OBDD proof of the pigeonhole principle. 

Interpolation. Craig’s Interpolation Theorem in the propositional setting is this. Let 
A(x, y) and /ify. z) be propositional formulas for which x, y and z are pairwise dis- 
joint. If A(x, y) A B( y, z) is unsatisfiable, then there exists a formula C'(y) such that 
A(x,y) A -’C(y) and C{ y) A B( y, z) are both unsatisfiable. The promised C'(y) is 
called an interpolant. 

Interpolation has been used in propositional proof complexity as a method for lower 
bounds. Following earlier working starting in [IPU94,BPR97], Krajicek [Kra97] sug- 
gested the following approach. Suppose we are given a refutation of A(x, y) A B( y, z). 
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Suppose, further, that we are able to extract an interpolant C(y) by manipulation from 
the proof. Then, lower bounds for the complexity of the interpolants give lower bounds 
for the refutations. This idea has been used successfully for a number of proof systems 
including resolution and cutting planes (see [IPU94,BPR97,Kra97,Pud97]). The fea- 
sible interpolation of resolution has been recently used by McMillan [McM03] as an 
effective abstraction technique in symbolic model checking. 

Our aim is to discuss the fact that refutations by OBDDs have feasible interpolation 
for certain orders of the variables. Following the machinery developed in [IPU94], it is 
enough to observe that evaluating an OBDD requires small communication complexity 
for nice orders. We omit further details in this version and state the final result without 
proof. The narrowness of an OBDD is the maximum number of nodes in a level. 

Theorem 9. Let F = A(x,y) A B( y,z) be an unsatisfiable 3 -CNF formula, and let 
n = |y |. IfF has an OBDD refutation of length m with OBDDs of narrowness bounded 
by c, and with an order that is consistent with x < y < z, then F has an interpolant 
circuit of size 0(c 2 {in + n)). In particular, if the size of the refutation is s, then the size 
of the interpolant is s 0 ^ . In addition, ifA(x , y) is monotone in y, then the interpolant 
circuit is monotone. 

Let us mention that the monotone feasible interpolation of refutations by OBDDs 
establishes a separation from Frege systems with formulas of bounded alternation- 
depth. It is known that monotone interpolants for such systems require exponential-size 
[Kra97], This, together with the results of previous sections, establishes that refutations 
by OBDDs are incomparable in strength with Frege systems of bounded alternation- 
depth. 



6 Concluding Remarks 

Viewing constraint propagation as a proof system lifts proof complexity from propo- 
sitional logic to all constraint-satisfaction problems. There are many questions that re- 
main open from our work. 

First, it is necessary to have better understanding of the role of the weakening rule. 
We know it is not needed to achieve completeness, not even in the case of restricted refu- 
tation width in Theorem 2. It remains an open problem to determine whether refutation 
by OBDDs without weakening can polynomially simulate CP* refutations. Clarifying 
the role of weakening is also important for algorithmic applications. Second, the proof 
complexity of refutations by OBDDs needs further development. One problem that is 
left open is to find a non-trivial lower bound for the size of refutations by OBDDs 
that holds for every order of the variables. Another problem that is left open is whether 
OBDD-based refutations polynomially simulate cutting planes with coefficients written 
in binary. Are OBDD-based refutations automatizable in the sense of [BPROO]? Can we 
use the feasible interpolation of OBDD-based refutations in an effective manner analo- 
gous to that of McMillan [McM03]? 

Finally, it would be good to find practical decision procedures based on CSP proofs, 
the same way that the DPLL approach is based on resolution. Some progress in this 
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direction is reported in [DR94], which reports on SAT-solving using directional resolu- 
tion, and in [PV04], which reports on SAT-solving using OBDD-based refutations. This 
could lead to CSP-solvers that deal directly with the CSP instances, avoiding the need 
to translate to a propositional formula and using a SAT-solver as it is sometimes done. 
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Abstract. A constraint satisfaction problem (CSP) model can be preprocessed 
to ensure that any choices made will lead to solutions, without the need to back- 
track. This can be especially useful in a real-time process control or online inter- 
active context. The conventional machinery for ensuring backtrack-free search, 
however, adds additional constraints, which may require an impractical amount 
of space. A new approach is presented here that achieves a backtrack-free repre- 
sentation by removing values. This may limit the choice of solutions, but we are 
guaranteed not to eliminate them all. We show that in an interactive context our 
proposal allows the system designer and the user to collaboratively establish the 
tradeoff in space complexity, solution loss, and backtracks. 



1 Introduction 

For some applications of constraint computation, backtracking is highly undesirable or 
even impossible. Online, interactive configuration requires a fast response given human 
impatience and unwillingness to undo previous decisions. Backtracking is likely to lead 
the user to abandon the interaction. In another context, an autonomous spacecraft must 
make and execute scheduling decisions in real time [10]. Once a decision is executed 
(e.g., the firing of a rocket) it cannot be undone through backtracking. It may not be 
practical in real time to explore the implications of each potential decision to ensure 
that the customer or the spacecraft is not allowed to make a choice that leads to a ’’dead 
end”. A standard technique in such an application is to compile the constraint problem 
into some form that allows backtrack-free access to solutions [12]. Except in special 
cases, the worst-case size of the compiled representation is exponential in the size of the 
original problem. Therefore, the common view of the dilemma is as a tradeoff between 
storage space and backtracks: worst-case exponential space requirements can guaran- 
tee backtrack-free search while bounding space requirements (e.g., through adaptive 
consistency [3] techniques with a fixed maximum constraint arity) leave the risk of 
backtracking. For an application such as the autonomous spacecraft where memory is 
scarce and backtracking is impossible, the two-way tradeoff provides no solution. 

With the above two examples in mind, in this paper, we assert that the two-way 
tradeoff is too simple to be applicable to all interesting applications. We propose that 
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there is a three-way tradeoff: storage space, backtracking, and solution retention. In 
the extreme, we propose a simple, radical approach to achieving backtrack-free search 
in a CSP: preprocess the problem to remove values that lead to dead-ends. Consider 
a coloring problem with variables {X, Y, Z} and colors {red, blue}. Suppose Z must 
be different from both X and Y and our variable ordering is lexicographic. There is a 
danger that the assignments X = red, Y = blue will be made, resulting in a domain 
wipeout for Z. A conventional way of fixing this would be to add a new constraint 
between X and Y specifying that the tuple in question is prohibited. Such a constraint 
requires additional space, and, in general, such adaptive consistency enforcement may 
have to add constraints involving as many as n — 1 variables for an n-variable problem. 
Our basic insight here is simple, but counter-intuitive. We will “fix” the problem by 
removing the choice of red for X. One solution, {X = red, Y = red, Z = blue}, 
is also removed but another remains, {X = blue,Y = blue, Z = red}. If we also 
remove red as a value for Y we are left with a backtrack-free representation for the 
whole problem. (The representation also leaves us with a single set of values comprising 
a single solution. In general we will only restrict, not eliminate, choice.) 

The core of our proposal is to preprocess a CSP to remove values that lead to a 
dead-end. This allows us to achieve a “backtrack-free” representation (BFR) where all 
remaining solutions can be enumerated without backtracking and where the space com- 
plexity is the same as for the original CSP. We are able to achieve backtrack-free search 
and polynomially bounded storage requirements. The major objection to such an ap- 
proach is that the BFR will likely only represent a subset of solutions. There are two 
responses to this objection demonstrating the utility of treating backtracks, space, and 
solutions together. First, for applications where backtracking is impossible, an exten- 
sion of the BFR approach provides a tradeoff between space complexity and solution 
retention. Through the definition of a /, -BFR, systems designers can choose a point 
in this tradeoff. Value removal corresponds to 1-BFR, where space complexity is the 
same as for the original representation and many solutions may be lost. The value of 
k is the maximum arity constraint that we add during preprocessing: higher k leads to 
higher space complexity but fewer lost solutions. The memory capacity can therefore 
be directly traded off against solution loss. Second, for applications where backtracks 
are only undesirable, we can allow the user to make the decision about solution loss vs. 
backtracks by using two representations: the original problem and the BFR. The latter 
is used to guide value choices by representing the sub-domain for which it is guaranteed 
that a backtrack-free solution exists. The user can choose between a conservative value 
assignment that guarantees a backtrack-free solution or a risky approach that does not 
have such guarantees but allows access to all solutions. 

Overall, the quality of a BFR depends on the number and quality of the solutions 
that are retained. After presenting the basic BFR algorithm and proving its correctness, 
we turn to these two measures of BFR quality through a series of empirical studies that 
examine extensions of the basic algorithm to include heuristics, consistency techniques, 
preferences on solutions, and the representation of multiple BFRs. Each of the empir- 
ical studies represents an initial investigation of different aspects of the BFR concept 
from the perspective of quality. We then present the A;- BFR concept, revisit the issue of 
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Algorithm 1: BFRB - computes a BFR. 

BFRB(n): 

Obtains a BFR for P n (maintained as a global variable) 

1 if domain of V„ is empty then 

2 |_ report Failure 

3 if n = 1 then 

4 |_ report Success 

5 foreach solution S to the parent subproblem that does not extend to V n do 

6 L Choose a value v in S and remove it from the domain of its variable. 

7 recursively seek a BFR for P n -i '■ 

8 If successful, report Success. 

9 If not, make one different choice of a value to remove, and recurse again. 
10 When there are no more different choices to make, report Failure. 



solution loss, and show how our proposal provides a new perspective on a number of 
dichotomies in constraint computation. 

The primary contributions of this paper are the proposal of a three-way tradeoff 
between space complexity, backtracks, and solution retention, the somewhat counter- 
intuitive idea of creating backtrack-free representations by value removal, the empirical 
investigation of a number of variations of this basic idea, and the placing of the idea in 
the context of a number of important threads of constraint research. 

1.1 Related Work 

Early work on CSPs guaranteed backtrack-free search for tree-structured problems [4]. 
This was extended to general CSPs through A; -trees [5] and adaptive consistency [3]. 
These methods have exponential worst-case complexity, but, for preprocessing, time 
is not a critical factor as we assume we have significant time offline. However, these 
methods also have exponential worst-case space complexity, which may indeed make 
them impractical. 

Efforts have been made in the past to precompile all solutions in a compact form [ 1 , 
8, 9, 1 1, 12]. These approaches achieved backtrack-free search at the cost of worst-case 
exponential storage space. While a number of interesting techniques to reduce average 
space complexity (e.g., meta-CSPs and interchangeability [ 12]) have been investigated, 
they do not address the central issue of worst case exponential space complexity. Indeed, 
as far as we have been able to determine, the need to represent all solutions has not been 
questioned in existing work. Furthermore, recasting the problem as a three-way tradeoff 
between space complexity, backtracks, and solution retention appears novel. 

2 Algorithm, Alternatives, and Analysis 

We describe a basic algorithm for obtaining a BFR by deleting values, prove it correct 
and examine its complexity. Given a problem P and a variable search order V\ to V n , we 
will refer to the subproblem induced by first k variables as P k . A variable V t is a parent 
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of Vk if it shares a constraint and i < k. We call the subproblem induced by the parents 
of Vk the parent subproblem of Vk- P n will be a backtrack-free representation if we 
can choose values for V\ to V n without backtracking. BFRB operates on a problem and 
produces a backtrack-free representation of the problem, if it is solvable, else reports 
failure. We will refer to the algorithm’s removal of solutions to the parent subproblem 
of Vk that do not extend to Vk as processing of 14 . 

The BFRB algorithm is quite straightforward. It works its way upwards through a 
variable ordering, ensuring that no trouble will be encountered in a search on the way 
back down, as does adaptive consistency; but here difficulties are avoided by removing 
values rather than adding (or refining) constraints. (Of course, removing a value can be 
viewed as adding/refining a unary constraint.) 

However, correctness is not as obvious as it might first appear. It is clear that a BFR 
to a soluble problem must exist; any individual solution provides an existence proof: 
simply restrict each variable domain to the value in the solution. However, we might 
worry that BFRB might not notice if the problem is insoluble, or in removing values it 
might in fact remove all solutions, without noticing it. 

Theorem 1 If P is soluble, BFRB will find a backtrack-free representation. 

Proof: Proof by induction. 

Inductive step: If we have a solution s to Pk-i we can extend it to a solution to 
Pk without backtracking. Solution s restricted to the parents of 14 is a solution to the 
parent subproblem of 14- There is a value, 6, for 14 consistent with this solution, or else 
this solution would have been eliminated by BFRB. Adding b to s gives us a solution to 
Pk, since we only need worry about the consistency of b with the parents of 14 . 

Base step: Pi is soluble, i.e. the domain of \f is not empty after BFRB. Since P is 
soluble, let s be one solution, with .s-| as the value for V\ . We will show that if it does 
not succeed otherwise, BFRB will succeed by providing a representation that includes 
Si in the domain of V\. We will do this by demonstrating, again by induction, that in 
removing a solution to a subproblem, s p , BFRB will always have a choice that does not 
involve a value of s. Suppose BFRB has proceeded up to 14 without deleting any value 
in s. It is processing 14 and a solution s p to the parent subproblem does not extend to 
Vk- If all the values in s p are in s, then there is a value in 14 that is consistent with them, 
namely the value for 14 in s. So one of the values in s p must not be in s, and BFRB can 
choose at some point to remove it. (The base step for V n is trivial.) Now since BFRB 
tries, if necessary, all choices for removing values, BFRB will choose eventually, if 
necessary, not to remove any value in s, including si. □ 

Theorem 2 If P is insoluble, BFRB will report failure. 

Proof: Proof by induction. 

P n = P is given insoluble. We will show that if If is insoluble, then after BFRB 
processes 14, Pk-i is insoluble. Thus eventually BFRB will always backtrack when Pi 
becomes insoluble (the domain of \f is empty ) if not before, and BFRB will eventually 
run out of choices to try, and report failure. 

Suppose Pk is insoluble. We will show that Pk-i is insoluble in a proof by contra- 
diction. Suppose s is a solution of Pk-i- Then s restricted to the parents of 14, s p , is 
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a solution of the parent subproblem of 14, which is a subproblem of Pk-i- There is a 
value b of Vk consistent with s p , for otherwise s p would have been eliminated during 
processing of Vk- But if b is consistent with s p , s plus b is a solution to 14 . Contradic- 
tion. □ 

The space complexity of BFRB is polynomial in the number of variables and values, 
as we are only required to represent the domains of each variable. The worst-case time 
complexity is, of course, exponential in the number of variables, n. However, as we will 
see in the next section, by employing a “seed solution”, we can recurse without fear of 
failure, in which case the complexity can easily be seen to be exponential in (p + 1), 
where p is the size of the largest parent subproblem. Of course, p + 1 may still equal n 
in the worst case; but when this is not so, we have a tighter bound on the complexity. 

3 Extensions and Empirical Analysis 

Preliminary empirical analysis showed that the basic BFRB algorithm is ineffective in 
finding BFRs even for small problems. We therefore created a set of basic extensions 
that significantly improved the algorithm performance. We then performed further ex- 
periments, building on these basic extensions. In this section, we present and empiri- 
cally analyze these extensions. 

3.1 Basic Extensions 

A significant part of the running time of BFRB was due to the fallibility of the value 
pruning decision at line 6. While BFRB is guaranteed to eventually find a BFR for a 
soluble problem, doing so may require “backtracking” to previous pruning decisions 
because all solutions had been removed. To remove this thrashing, our first extension to 
BFRB is to develop a BFR around a “seed” solution. Secondly, no consistency enforce- 
ment is present in BFRB. It seems highly likely that such enforcement will reduce the 
search and increase the quality of the BFRs. 

Seed Solutions. In searching for a BFR, we can avoid the need to undo pruning de- 
cisions by guaranteeing that at least one solution will not be removed. We do this by 
first solving the standard CSP (i.e., finding one solution) and then using this solution 
as a seed during the preprocessing to find a BFR. We modify the BFRB pruning (line 
6) to specify that it cannot remove any values in the seed solution. This is sufficient to 
guarantee that there will never be a need to undo pruning decisions. There is a compu- 
tational cost to obtaining the seed, and preserving it reduces the flexibility we have in 
choosing which values to remove; but we avoid thrashing when finding a BFR and thus 
improve the efficiency of our algorithm. In addition, seed solutions provide a mecha- 
nism for guaranteeing that a solution preferred by the system designer is represented in 
the BFR. 

Experiments indicated that not only is using a seed significantly faster, it also tends 
to produce BFRs which represent more solutions. Given the strength of these results, 
all subsequent experiments are performed using a seed. 
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Fig. 1. Absolute and relative number of solutions retained. 



Enforcing Consistency. Given the usefulness of consistency enforcement in standard 
CSP solving, we expect it will both reduce the effort in searching for a BFR and, since 
non- AC values may lead to dead-ends, reduce the pruning decisions that must be made. 
We experimented with two uses of arc consistency (AC): establishing AC once in a 
preprocessing step and establishing AC whenever a value is pruned. The latter variation 
proved to incur less computational effort as measured in the number of constraint checks 
to find a BFR and resulted in BFRs which represented more solutions. In our subsequent 
experiments, we, therefore, establish AC whenever a value is pruned. 

Experiments: Solution Coverage. Our first empirical investigation is to assess the 
number of solutions that the BFR represents. To evaluate our algorithm instantiations, 
we generated random binary CSPs specified with 4-tuples (n, m, d , t), where n is the 
number of variables, m the size of their domains, d the density (i.e. the proportion of 
pairs of variables that have a constraint over them) and t the tightness (i.e. the proportion 
of inconsistent pairs in each constraint). We generated at least 50 problems for each 
tested CSP configuration where we could find a solution for at least 30 instances. In the 
following we refer to the mean of those 30 to 50 instances. 

While the absolute number of represented solutions naturally decreases when the 
problems become harder, the relative number of solutions represented decreases first 
and then increases. The decreasing lines in Figure 1 represent the absolute number 
of solutions for the original problem and the BFR for (15, 10, 0.1, t) problems. The 
numbers for t < 0.4 are estimated from the portion of solutions on the observed search 
space and the size of the non-observed search space. Experiment with fewer samples 
revealed similar patterns for smaller (10, 5, 0.5, t) and larger (50, 20, 0.07, t) problems. 
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Fig. 2. Percentage of solutions retained by the BFR with differing density and tightness. 



The increase in the relative number of solutions retained for very hard problems 
can be explained by the fact that a BFR always represents at least one solution and 
that the original problems have only very few solutions for these problem sets. For the 
very easy problems, there may not be much need to backtrack and thus to prune when 
creating the BFR. In the extreme case, the original problem is already backtrack-free. In 
a larger set of experiments we observed the decreasing/increasing behaviour for a range 
of density and tightness values. In Figure 2, we show the results of this experiment with 
(15, 10, d , t) problems, where d € {0.4, 1} and t £ {0.1, 1} both in steps of 0.1. 



Experiments: Computational Effort. Now we consider the computational effort re- 
quired when using BFRs. Our main interest is the offline computational effort to find a 
BFR. The online behaviour is also important, however, in a BFR all remaining solutions 
can be enumerated in linear time (in the number of solutions). As this is optimal, empir- 
ical analysis does not seem justified. Similarly, the (exponential) behaviour of finding 
solutions in a standard CSP is well-known. Figure 3 presents the CPU time for the 
problems considered in our experiments including the time to compute a seed solution. 
Times were found using C++ on a 1 .8 GHz, 512 MB pentium 4 running Windows 2000. 

It can be seen that the time to find BFRs scales well enough to produce them easily 
for the larger problems of our test set. 



Experiments: Solution Quality. The second criteria for a BFR is that it retain good 
solutions assuming a preference function over solutions. To investigate this, we examine 
a set of lexicographic CSPs [7] where the solution preference can be expressed via a 
re-ordering of variables and values such that lexicographically smaller solutions are 
preferred. 

To generate the BFR, a seed solution was found using lexicographic variable and 
value ordering heuristics that ensures that the first solution found is optimal. The best 
solution will thus be protected during the creation of the BFR and always be represented 
by it. For the evaluation of the BFR we used the set of its solutions or a subset of it that 
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Fig. 3 . Average runtime (seconds) to produce BFR. 



could be found within a time limit. This set was evaluated with both quantitative and 
qualitative measure: the number of solutions and their lexicographic rank. In Figure 4 
we present such an evaluation for (10,5,0.25,0.7) problems. The problem instances 
are shown on the x-axis while the solutions are presented on the y-axis with increasing 
quality. The solid line shows the total number of solutions of the original problem, 
which we used to sort the different problems to make the graph easier to read. Every 
‘x’ represents a solution retained by the BFR for this problem instance. In the figure we 
can observe for example, that the instance 35 has 76 solutions and its BFR has a cluster 
of very high quality (based on the lexicographic preferences) and a smaller cluster of 
rather poor quality solutions. 

3.2 Pruning Heuristics 

With the importance of value ordering heuristics for standard CSPs, it seems reasonable 
that the selection of the value to be pruned in BFRB may benefit from heuristics. It is 
unclear, however, how the standard CSP heuristics will transfer to BFRB. We examined 
the following heuristics: 

- Domain size: remove a value from the variable with minimum or maximum domain 
size. 

- Degree: remove a value from the variable with maximum or minimum degree. 

- Lexicographic: given the lexicographic preference on solutions, remove low values 
from important variables, in two different ways: (1) prune the value from the lowest 
variable whose value is greater than its position or (2) prune any value that is not 
among the best 10% in the most important 20% of all variables. 

- Random: remove a value from a randomly chosen variable. 
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problem instance 

Fig. 4. Solutions maintained by a BFR generated with min-degree heuristic for (10, 5, 0.25, 0.7) 
problems. 

Since we are using a seed solution, if the heuristically preferred value occurs in the 
seed solution, the next most preferred value is pruned. We are guaranteed that at least 
one parent will have a value that is not part of the seed solution or else we would not 
have found a dead-end. 

Experiments. BFRs were found with each of the seven pruning heuristics. Using a 
set of 1600 problems with varying tightness and density, we observed little difference 
among the heuristics: none performed significantly better than random on either number 
or quality of solutions retained. Apparently, our intuitions from standard CSP heuristics 
are not directly applicable to finding good BFRs. Further work is necessary to under- 
stand the behaviour of these heuristics and to determine if other heuristics can be applied 
to significantly improve the solution retention and quality of the BFRs. 

3.3 Probing 

Since we want BFRs to represent as many solutions as possible, it is useful to model 
the finding of BFRs as an optimization problem rather than as a satisfaction problem. 
There are a number of ways to search for a good BFR, for example, by performing a 
branch-and-bound to find the BFR with the maximal number of solutions. A simple 
technique investigated here is blind probing. Because we generate BFRs starting with a 
seed solution, we can iteratively generate seed solutions and corresponding BFRs and 
keep the BFR that retains the most solutions. This process is continued until no improv- 
ing BFR is found in 1000 consecutive iterations. Probing is incomplete in the sense that 
it is not guaranteed to find the BFR with maximal coverage. However, not only does 
such a technique provide significantly better BFRs based on solution retention, it also 
provides a baseline against which to compare our satisfaction-based BFRs. 
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Experiments. Table 1 presents the number of solutions using random pruning with 
and without probing on seven different problem sets each with 50 problem instances. 
Probing is almost always able to find BFRs with higher solution coverage. On average, 
the probing based BFRs retain more that twice as many solutions as the BFRs produced 
without probing. 



Table 1. Average number of solutions with and without probing. 



Problem 


No Probing 


Probing 


(10, 10, 0.75,0.3) 


41.36 


274.90 


(10, 20, 0.5, 0.3) 


774.26 


3524.22 


(10, 5, 0.25,0.3) 


121463.08 


134494.04 


(10, 5, 0.25, 0.7) 


27.84 


31.14 


(10, 5, 0.5, 0.3) 


587.42 


2204.08 


(10, 5, 0.5, 0.5) 


4.10 


4.28 


(10, 5, 0.75,0.3) 


8.35 


25.04 



3.4 Representing Multiple BFRs 

Another way to improve the solution coverage of BFRs is to maintain more than one 
BFR. Given multiple BFRs for a single problem, we span more of the solution space 
and therefore retain more solutions. Multiple BFRs can be easily incorporated into the 
original CSP by adding an auxiliary variable (whose domain consists of identifier values 
representing each unique BFR ) and n constraints. Each constraint restricts the domain 
of a variable to the backtrack-free values for each particular BFR identifier. Provided 
that we only represent a fixed number of BFRs, such a representation only adds a con- 
stant factor to the space complexity. Online, all variables are assigned in order except 
the new auxiliary variable and arc consistency allows the “BFR constraints” to remove 
values when they are no longer consistent with at least one BFR. 

Experiments. To investigate the feasibility of multiple BFRs, we found 10, 50 and 100 
differing BFRs for each (15, 10, 0.7, t) problem. The result of applying this technique 
is shown in Figure 5 in the relation to the solutions of the original problem, the number 
of solutions of the best BFR that could be found using probing (IterlOOOrand), and the 
number of solutions in a single BFR using random pruning (Random). Representing 
multiple BFRs clearly increases the solution coverage over the best single BFRs we 
were able to find with probing. 

4 Discussion 

In this section we discuss a number of additional issues arise in terms of extensions that 
we have not yet empirically investigated, the central issue that BFRs do not retain all 
solutions, the role of online consistency enforcement, and a broader perspective that the 
BFR concepts allows to a number of aspects of constraint research. 
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Fig. 5. Solutions represented by multiple BFRs and one BFR found with probing. 



4.1 fc-BFR and Restricted fc-BFR 

Instead of pruning values from variable domains, we could add new constraints restrict- 
ing the allowed tuples. In fact, from this perspective BFRB implements 1-BFR: value 
removal corresponds to adding unary constraints. This is one end of a spectrum with 
the other end being adaptive consistency. Between, we can define a range of algorithms 
in which we can add constraints of arity up to k. For an n - variable problem, when k 
is 1 we have the a value-removal algorithm, when k is n — 1 we have full adaptive 
consistency. As k increases the space complexity of our BFR increases but so does the 
solution retention. 

A further variation (called restricted k-BFR) addresses the space increase. Rather 
than adding constraints, we only tighten existing constraints. For example, assume 
that partial assignment, Xi = v ±, .... X m = v rn , does not extend to a solution and 
a constraint c over A' 1; ..., X rn exists. Restricted A:- BFR will simply remove the tuple 
(vi, ..., v m ) from c. In general, constraint c will not exist and therefore the algorithm 
has to consider constraints over subsets of the variables {X\, .... X m \ . Given binary 
constraints between some pairs of the variables, we can remove the tuple (vi, Vj) from 
any constraint where Xi and Xj are involved in the dead-end. A reasonable approach is 
to identify the highest arity constraint involved in a dead-end and remove a single tuple. 
This algorithm will always be applicable, since in the extreme no constraints among the 
parent variables exist and therefore pruning a “tuple” from a unary constraint is equiva- 
lent to BFRB. A drawback of restricted fc-BFR is that it requires extensional constraint 
representations. 

A:- BFR and restricted A;-BFR suggest a number of research questions. What is the 
increase in solution coverage and quality that can be achieved without extra space con- 



Backtrack-Free Search for Real-Time Constraint Satisfaction 



103 



sumption? If we allow new constraints to be added, how do we choose an appropriate 
k value? How does a good k relate to the arity of the constraints in the original CSP? 
Given the spectrum that exists between 1-BFR and adaptive consistency, we believe 
that future empirical work on these and related questions can be of significant use in 
making adaptive consistency-like techniques more practically applicable. 

4.2 Coming to Terms with Solution Loss 

The central challenge to the BFR concept is that solutions are lost: perfectly good solu- 
tions to the original CSP are not represented in its BFR. A problem transformation that 
loses solutions is a radical, perhaps heretical, concept. To revisit one of our motivating 
examples, failing to represent solutions in an online configuration application means 
that some valid product configurations cannot be achieved. This appears to be a high 
price to pay to remove the need to undo previous decisions. 

There are three characteristics of problem representations that are relevant here: 
space complexity, potential number of backtracks, and solution loss. Simultaneously 
achieving polynomial space complexity, a guarantee of backtrack-free search, and zero 
solution loss implies that P = NP. Therefore, we need to reason about the tradeoffs 
among these characteristics. Precisely how we make these tradeoffs depends on the 
application requirements. BFRs allow us to achieve tradeoffs that are appropriate for 
the application. 

In the autonomous space vehicle application example, backtracking is impossible 
and memory is extremely limited. The tradeoff clearly lies on the side of allowing so- 
lution loss: any solution is better than spending too much time or memory in finding 
the best solution. Therefore, a small number of BFRs that represent a reasonable set of 
solutions is probably the best approach. 

In the interactive configuration application example, space complexity is less of a 
problem and avoiding backtracks is important. Using BFRs, we can create a system that 
allows the system designer and user to collaboratively make the three-way tradeoff in 
two steps. First, the system designer can decide on the space complexity by choosing the 
arity, k, in a /. -BFR approach. In the extreme, a single n — 1-BFR achieves full adaptive 
consistency and so, if the memory space is available, a zero backtrack, zero solution 
loss BFR can be achieved. The system designer, therefore, makes the decision about 
the tradeoff between the number of solutions that can be found without backtracking 
and the space complexity. Furthermore, the use of seed solutions means that each BFR 
can be built around a solution preferred by the system designer: guaranteeing a minimal 
set of desirable solutions. Second, the BFRs together with the original CSP can be used 
online to allow the user to make the tradeoff between solution loss and backtracks. The 
BFRs create a partition of the domain of each variable: those values that are guaranteed 
to lead to a solution without backtracking and those for which no such guarantee is 
known. These partitions can be presented to the user by identifying the set of “safe” and 
“risky” options for a particular decision. If the user chooses to make safe decisions, the 
BFRs guarantee the existence of a solution. If the user decides to make a risky decision, 
the system can (transparently) transition to standard CSP techniques without solution 
guarantees. This allows the user to decide about the tradeoff between backtracks and 
solution loss: if the user has definite ideas about the desired configuration, more effort 
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in terms of undoing previous decisions may be necessary. If the user prefers less effort 
and has weaker preferences, a solution existing in one of the BFRs can be found. 

The basic BFR concept encompasses solution loss because it allows backtrack-free 
search with no space complexity penalty for those applications where zero backtracks 
and limited memory are hard constraints. When solution loss may be more important, 
the fc- BFR concept together with online solving using both the BFR and the original 
representation allow the system designer and the user to collaboratively decide on the 
tradeoff among space complexity, backtrack-free search, and solution loss. 

4.3 Online Consistency Enforcement 

By changing our assumptions about the on-line processing, we can also expand the set 
of techniques applied in finding the BFR. As noted, we are enforcing arc-consistency 
during the creation of a BFR: whenever a value is pruned to remove a dead-end, we en- 
force arc-consistency. When solving a problem online, however, we do no propagation 
as we are guaranteed that the possible values are (globally) consistent with the previous 
assignments. If, instead, we use forward checking or MAC online, we can remove fewer 
dead-ends offline and retain more solutions in a BFR. In creating a BFR, we need to 
remove dead-ends that may be encountered by the online algorithm. If the online algo- 
rithm itself can to avoid some dead-ends (i.e., through use of propagation), they do not 
need to be dealt with in the BFR. This means, in fact, that a backtrack-free representa- 
tion is backtrack-free with respect to the online algorithm: a BFR built for MAC will 
not be a BFR for simple backtracking (though the converse is true). BFRB can be easily 
modified to ensure that only those dead-ends that exist for a specific online algorithm 
will be pruned. 

4.4 Context 

The work on BFRs presents a perspective on a number of fundamental dichotomies in 
constraint processing. 

Inference vs. Search. As in many aspects of constraint computation, the axis that runs 
from inference to search is relevant for BFRs. The basic BFR algorithm allows us to 
perform pure search online without fear of failure, BFRs for online algorithms that use 
some level of inference require more online computation while still ensuring no back- 
tracks and preserving more solutions. It would be interesting to study the characteristics 
of BFRs as we increase the level of online consistency processing. 

Implicit vs. Explicit Solutions. BFR models can be viewed along a spectrum of implicit 
versus explicit solution representation, where the original problem lies at one end, and 
the set of explicit solutions at the other. The work on “bundling” solutions provides 
compact representations of sets of solutions. Hubbe & Freuder [8] and Lesaint [9] rep- 
resent sets of solutions as Cartesian products, each one of which might be regarded as 
an extreme form of backtrack-free representation. If we restrict the variable domains 
to one of these Cartesian products, every combination of choices is a solution. All the 
solutions can be represented as a union of these Cartesian products, which suggests that 
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we might represent all solutions by a set of distinct BFRs. As we move toward explicit 
representation, the preprocessing cost rises. Usually the space cost does as well, but 
1-BFR and restricted fc-BFR are an exception that lets us “have our cake and eat it too”. 

Removing Values vs. Search. Removing values is related in spirit to work on domain 
filtering consistencies [2] though these do not lose solutions. Another spectrum in which 
BFRs play a part therefore is based on the number of values removed. We could envision 
BFRB variations that remove fewer values, allowing more solutions, but also accepting 
some backtracking. Freuder & Hubbe [6] remove solutions in another manner, though 
not for preprocessing, but simply in attempting to search more efficiently. Of course, a 
large body of work on symmetry and interchangeability does this. 

Offline vs. Online Effort. BFRs lie at one end of an axis that increasingly incorporates 
offline preprocessing or precompilation to avoid online execution effort. These issues 
are especially relevant to interactive constraint satisfaction, where human choices alter- 
nate with computer inference, and the same problem representation may be accessed 
repeatedly by different users seeking different solutions. They may also prove increas- 
ingly relevant as decision making fragments among software agents and web services. 
Amilhastre et al. [1 ] have recently explored interactive constraint solving for configu- 
ration, compiling the CSP offline into an automaton representing the set of solutions. 

“Customer-Centric” vs. “Vendor-Centric” Preferences. As constraints are increasingly 
applied to online applications, the preferences of the different participants in a transac- 
tion will come to the fore. It will be important to bring soft constraints, preferences and 
priorities, to bear on BFR construction to address the axis that lies between “customer- 
centric” and “vendor-centric” processing. For example, a customer may tell us, or we 
may learn from experience with the customer, that specific options are more important 
to retain. Alternatively, a vendor might prefer to retain an overstocked option, or to 
remove a less profitable one. 



5 Conclusion 

In this paper we identify, for the first time, the three-way tradeoff between space com- 
plexity, backtrack-free search, and solution loss. We presented an approach to obtaining 
a backtrack-free CSP representation that does not require additional space and investi- 
gated a number of variations on the basic algorithm including the use of seed solutions, 
arc-consistency, and a variety of pruning heuristics. We have evaluated experimentally 
the cost of obtaining a BFR and the solution loss for different problem parameters. 
Overall, our results indicate that a significant proportion of the solutions to the origi- 
nal problem can be retained especially when an optimization algorithm that specifically 
searches for such “good” BFRs is used. We have seen how multiple BFRs can cover 
more of the solution space. Furthermore, we have argued that BFRs are an approach 
that allows the system designer and the user to collaboratively control the tradeoff be- 
tween the space complexity of the problem representation, the backtracks that might be 
necessary to find a solution, and the loss of solutions. 
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Our approach should prove valuable in real-time process control and online inter- 
active problem solving where backtracking is either impossible or impractical. We ob- 
served further that the BFR concept provides an interesting perspective on a number 
of theoretical and practical dichotomies within the field of of constraint programming, 
suggesting directions for future research. 
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Abstract. This article deals with global constraints for which the set of solu- 
tions can be recognized by an extended finite automaton whose size is bounded 
by a polynomial in n, where n is the number of variables of the corresponding 
global constraint. By reformulating the automaton as a conjunction of signature 
and transition constraints we show how to systematically obtain a filtering al- 
gorithm. Under some restrictions on the signature and transition constraints this 
filtering algorithm achieves arc-consistency. An implementation based on some 
constraints as well as on the metaprogramming facilities of SICStus Prolog is 
available. For a restricted class of automata we provide a filtering algorithm for 
the relaxed case, where the violation cost is the minimum number of variables to 
unassign in order to get back to a solution. 

1 Introduction 

Deriving filtering algorithms for global constraints is usually far from obvious and re- 
quires a lot of energy. As a first step toward a methodology for semi-automatic de- 
velopment of filtering algorithms for global constraints, Carlsson and Beldiceanu have 
introduced [12] an approach to design filtering algorithms by derivation from a finite 
automaton. As quoted in their discussion, constructing the automaton was far from obvi- 
ous since it was mainly done as a rational reconstruction of an emerging understanding 
of the necessary case analysis related to the required pruning. However, it is commonly 
admitted that coming up with a checker which tests whether a ground instance is a 
solution or not is usually straightforward. This was for instance done for constraints de- 
fined in extension first by Vempaty [29] and later on by Amilhastre et al. [1]. This was 
also done for arithmetic constraints by Boigelot and Wolper [10]. Within the context of 
global constraints on a finite sequence of variables, the recent work of Pesant [25] uses 
also a finite automaton for constructing a filtering algorithm. This article focuses on 
those global constraints that can be checked by scanning once through their variables 
without using any extra data structure. 

As a second step toward a methodology for semi-automatic development of filtering 
algorithms, we introduce a new approach which only requires defining a finite automa- 
ton that checks a ground instance. We extend traditional finite automata in order not to 
be limited only to regular expressions. Our first contribution is to show how to reformu- 
late the automaton associated with a global constraint as a conjunction of signature and 
transition constraints. We characterize some restrictions on the signature and transition 
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constraints under which the filtering algorithm induced by this reformulation achieves 
arc-consistency and apply this new methodology to the two following problems: 1. The 
design of filtering algorithms for a fairly large set of global constraints. 2. The design of 
filtering algorithms for handling the conjunction of several global constraints. While the 
works of Amilhastre et al. and Pesant both rely on simple automata and use an ad-hoc 
filtering algorithm, our approach is based on automata with counters and reformulation. 
As a consequence we can model a larger class of global constraints and prove properties 
on the consistency by reasoning directly on the constraint hypergraph. 

Our second contribution is to provide for a restricted class of automata a filtering 
algorithm for the relaxed case. This technique relies on the variable based violation 
cost introduced in [26,23]. This cost was advocated as a generic way for expressing 
the violation of a global constraint. However, algorithms were only provided for the 
sof t_a 11 different constraint [26]. We come up with an algorithm for computing 
a sharp bound of the minimum violation cost and with a filtering algorithm for pruning 
in order to avoid to exceed a given maximum violation cost. 

Section 2 describes the kind of finite automaton used for recognizing the set of so- 
lutions associated with a global constraint. Section 3 shows how to come up with a 
filtering algorithm which exploits the previously introduced automaton. Section 4 de- 
scribes typical applications of this technique. Finally, for a restricted class of automata. 
Section 5 provides a filtering algorithm for the relaxed case. 



2 Description of the Automaton Used 
for Checking Ground Instances 

We first discuss the main issues behind the task of selecting what kind of automa- 
ton to consider for expressing in a concise way the set of solutions associated with a 
global constraint. We consider global constraints for which any ground instance can be 
checked in linear time by scanning once through their variables without using any data 
structure. In order to concretely illustrate this point we first select a set of global con- 
straints and write down a checker for each of them. Finally, we give for each checker 
a sketch of the corresponding automaton. Based on these observations, we define the 
type of automaton we will use. 

Selecting an Appropriate Description. As we previously said, we focus on those 
global constraints that can be checked by scanning once through their variables. This is 
for instance the case of element [19], minimum [3], pattern [11], global_con- 
tiguity [22], lexicographic ordering [17], among [6] and inflection 
[2]. Since they illustrate key points needed for characterizing the set of solutions asso- 
ciated with a global constraint, our discussion will be based on the last four constraints 
for which we now recall the definition: 

- The global_contiguity(t;ars) constraint enforces for the sequence of 0-1 
variables vars to have at most one group of consecutive 1 . For instance, the con- 
straint global_contiguity([0, 1, 1, 0]) holds since we have only one group of 
consecutive 1. 
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- The lexicographic ordering constraint x <i ex V over two vectors of variables x = 
(x 0 , . . . , x n -i) and Ij = (y 0 , ... , y n - 1 ) holds iff n = 0 or x 0 < y 0 or x 0 = yo 

and (x\ , . . . , X n —\ ) fLe X (yi > • • ■ ; Un — 1) • 

- The among {nvar, vars, values ) constraint restricts the number of variables of the 
sequence of variables vars, which take their value in a given set values, to be equal 
to the variable nvar. For instance, among(3, [4, 5, 5, 4, 1], [1, 5, 8]) holds since ex- 
actly 3 values of the sequence 45541 are located in {1, 5, 8}. 

- The inflection {ninf , vars) constraint enforces the number of inflections of 
the sequence of variables vars to be equal to the variable ninf . An inflection 
is described by one of the following patterns: a strict increase followed by a strict 
decrease or, conversely, a strict decrease followed by a strict increase. For instance, 
inf lection(4, [3, 3, 1, 4, 5, 5, 6, 5, 5, 6, 3]) holds since we can extract from the 
sequence 33145565563 the four subsequences 314, 565, 6556 and 563, which all 
follow one of these two patterns. 

global_contiguity (vars [0 . .n-1] ): BOOLEAN global_contiguity ^lex among 



1 BEGIN 

2 i=0 ; 

3 WHILE i<n AND vars[i]=0 DO i++ 

4 WHILE i<n AND vars[i]=l DO i++ 

5 WHILE i<n AND vars[i]=0 DO i++ 

6 RETURN (i=n) ; 

7 END. 




(Al) 



^lex(x[0. .n-1] , y [0 . .n-1] ) : BOOLEAN 



1 BEGIN 

2 I i=0 ; 



3 WHILE i<n AND x[i]=y[i] DO i++; 

4 I RETURN (i=n OR x[i]<y[i] ) ; 

5 END. 



(Bl) 



amongfnvar , vars [0. .n-1] , values) : BOOLEAN 

1 BEGIN 

2 i=0; c-0 ; 

3 WHILE i<n DO 

IF vars [i] in values THEN C++; 
i++ ; 

6 RETURN (nvar=c) ; 



(A2) (B2) 



(C2) 





inflection 

vars [i] =vars [i+1] 



inflection (ninf , vars [0 . .n-1] ) :BOOLEAN 

01 BEGIN 

02 i=0; c-0 ; 

03 WHILE i<n-l AND vars [i] =vars [i+1] DO i++; 

04 IF i<n-l THEN less= (vars [i] <vars [i+1] ) ; 

05 WHILE i<n-l DO 

06 IF less THEN 

07 I IF vars [i] >vars [i+1] THEN C++; less=FALSE; 

08 ELSE 

09 I IF vars [i] <vars [i+1] THEN C++; less=TRUE; 

10 i++ ; 

11 RETURN (ninf =c) ; 

12 END. 



(Dl) 



(D2) 



Fig. 1 . Four checkers and their conesponding automata. 



Parts (Al), (Bl), (Cl) and (Dl) of Fig. 1 depict the four checkers respectively associated 
with global_contiguity, with <i ex , with among and with inflection. For 
each checker we observe the following facts: 
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- Within the checker depicted by part (Al) of Fig. 1, the values of the sequence 
t/ars[0], . . . , vars[n — 1] are successively compared against 0 and 1 in order to 
check that we have at most one group of consecutive 1 . This can be translated to 
the automaton depicted by part (A2) of Fig. 1. The automaton takes as input the 
sequence wzrs[0], . . . , vars[n — 1], and triggers successively a transition for each 
term of this sequence. Transitions labeled by 0, 1 and $ are respectively associated 
with the conditions vars[i\ = 0, vars[i\ = 1 and i = n. Transitions leading to 
failure are systematically skipped. This is why no transition labeled with a 1 starts 
from state z. 

- Within the checker given by part (Bl) of Fig. 1, the components of vectors ~x 
and y are scanned in parallel. We first skip all the components that are equal and 
then perform a final check. This is represented by the automaton depicted by part 
(B2) of Fig. 1. The automaton takes as input the sequence (#[0], j/[0]), . . . , (x[n — 
1 },y[n — 1]) and triggers a transition for each term of this sequence. Unlike the 
global_contiguity constraint, some transitions now correspond to a condi- 
tion (e.g. x[i] = y[i\, x[i\ < y[i\ ) between two variables of the <i ex constraint. 

- Observe that the among (nvar, vars, values ) constraint involves a variable nvar 
whose value is computed from a given collection of variables vars. The checker 
depicted by part (Cl) of Fig. 1 counts the number of variables of i>ars[0], . . . , 
vars[n — 1] that take their value in values. For this purpose it uses a counter c, 
which is eventually tested against the value of nvar. This convinced us to allow 
the use of counters in an automaton. Each counter has an initial value which can 
be updated while triggering certain transitions. The final state of an automaton can 
enforce a variable of the constraint to be equal to a given counter. Part (C2) of 
Fig. 1 describes the automaton corresponding to the code given in part (Cl) of the 
same figure. The automaton uses the counter c initially set to 0 and takes as input 
the sequence uars[0], . . . , vars[n — 1]. It triggers a transition for each variable of 
this sequence and increments c when the corresponding variable takes its value in 
values. The final state returns a success when the value of c is equal to nvar. At 
this point we want to stress the following fact: It would have been possible to use 
an automaton which avoids the use of counters. Flowever. this automaton would 
depend on the effective value of the parameter nvar. In addition, it would require 
more states than the automaton of part (C2) of Fig. 1 . This is typically a problem if 
we want to have a fixed number of states in order to save memory as well as time. 

- As the among constraint, the inflection (ninf, vars) constraint involves a 
variable ninf whose value is computed from a given sequence of variables 
vars[0], . . . , vars[n — 1]. Therefore, the checker depicted in part (Dl) of Fig. 1 
uses also a counter c for counting the number of inflections, and compares its final 
value to the ninf parameter. This program is represented by the automaton depicted 
by part (D2) of Fig. 1. It takes as input the sequence of pairs (wzrs[0], iws[l]), 
(uars[l], vars[ 2]) , . . . , (vars[n — 2], vars[n — 1]) and triggers a transition for each 
pair. Observe that a given variable may occur in more than one pair. Each transition 
compares the respective values of two consecutive variables of vars[0..n — 1] and 
increments the counter c when a new inflection is detected. The final state returns a 
success when the value of c is equal to ninf. 
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Synthesizing all the observations we got from these examples leads to the following 
remarks and definitions for a given global constraint C: 

- For a given state, no transition can be triggered indicates that the constraint C does 
not hold. 

- Since all transitions starting from a given state are mutually incompatible all au- 
tomata are deterministic. Let A4 denote the set of mutually incompatible conditions 
associated with the different transitions of an automaton. 

- Let A o, • . . , A m _-| denote the sequence of subsets of variables of C on which the 
transitions are successively triggered. All these subsets contain the same number of 
elements and refer to some variables of C. Since these subsets typically depend on 
the constraint, we leave the computation of Ao, . . ■ , A m _ i outside the automaton. 
To each subset A t of this sequence corresponds a variable .S', with an initial domain 
ranging over [min, min + \Ai \ — 1], where min is a fixed integer. To each integer 
of this range corresponds one of the mutually incompatible conditions of M. The 
sequences So, , S m - 1 and Ao, . . . , A m _ i are respectively called the signature 
and the signature argument of the constraint. The constraint between S, and the 
variables of Ai is called the signature constraint and is denoted by 'Pc (Si, A;). 

- From a pragmatic point the view, the task of writing a constraint checker is nat- 
urally done by writing down an imperative program where local variables (i.e., 
counters), assignment statements and control structures are used. This suggested us 
to consider deterministic finite automata augmented with counters and assignment 
statements on these counters. Regarding control structures, we did not introduce 
any extra feature since the deterministic choice of which transition to trigger next 
seemed to be good enough. 

- Many global constraints involve a variable whose value is computed from a given 
collection of variables. This convinced us to allow the final state of an automaton 
to optionally return a result. In practice, this result corresponds to the value of a 
counter of the automaton in the final state. 

Defining an Automaton. An automaton A of a constraint C is defined by a sextuple 

(. Signature , SignatureVomain , SignatureArg, Counters, States, T ransitions ) 

where: 

- Signature is the sequence of variables So, ■■■ , S m - 1 corresponding to the signa- 
ture of the constraint C. 

- SignatureVomain is an interval which defines the range of possible values of the 
variables of Signature. 

- SignatureArg is the signature argument A 0 , . . . , A m _i of the constraint C. The 
link between the variables of A, and the variable S, (0 < i < to) is done by 
writing down the signature constraint Pc (Si, Ai) in such a way that arc-consistency 
is achieved. In our context this is done by using standard features of the CLP(FD) 
solver of SICStus Prolog [13] such as arithmetic constraints between two variables, 
propositional combinators or the global constraints programming interface. 




112 



Nicolas Beldiceanu, Mats Carlsson, and Thierry Petit 



- Counters is the, possibly empty, list of all counters used in the automaton A. Each 
counter is described by a term t (Counter, InitialValue, FinalVariable) where 
Counter is a symbolic name representing the counter, InitialValue is an integer 
giving the value of the counter in the initial state of A , and FinalVariable gives the 
variable that should be unified with the value of the counter in the final state of A. 

- States is the list of states of A, where each state has the form source(i(i), sink( id) 
or node(zd). id is a unique identifier associated with each state. Finally, source(zd) 
and sink(z(i) respectively denote the initial and the final state of A. 

- Transitions is the list of transitions of A. Each transition t has the form 
arc(idi, label , id 2 ) or arc(idi, label, id 2 , counters). id\ and A 2 respectively cor- 
respond to the state just before and just after t, while label depicts the value that the 
signature variable should have in order to trigger t. When used, counters gives for 
each counter of Counters its value after firing the corresponding transition. This 
value is specified by an arithmetic expression involving counters, constants, as well 
as usual arithmetic functions such as +, — , min or max. The order used in the 
counters list is identical to the order used in Counters. 

Example 1. As an illustrative example we give the description of the automaton asso- 
ciated with the inflection (ninf, vars) constraint. We have: 

- Signature = So, S\, . . . , S n - 2 , 

- Signature!) omain = 0..2, 

- SignatureArg = (uars[0], uars[l]), . . . , ( vars[n — 2], vars[n — 1]), 

- Counters = t(c, 0, ninf), 

- States = [source(s), node(i), node(j), sink(t)\, 

- T ransitions = [arc(s, 1 , s), arc(s, 2 , i), arc(s, 0 , j), arc(s, $, t), arc (i, 1 , i), arc (i, 2 , i), 
arc(z, 0 , j, [c+ 1 ]), arc(z, $,t), arc(j, l,j), arc(j, 0,j), arc(j, 2,i, [c+ 1 ]), arc(j, $, t)}. 

The signature constraint relating each pair of variables (yars[i],vars[i + 1]) to 
the signature variable .S',; is defined as follows: tfAnfiectionS, vars[i],vars[i + 1]) = 
vars[i ] > vars[i + 1] Si = 0 A vars[i \ = vars[i + 1] Si = 1 A vars[i \ < 
vars[i + 1] Si = 2. The sequence of transitions triggered on the ground in- 
stance inf lection(4, [3, 3, 1, 4, 5, 5, 6, 5, 5, 6, 3]) is 3 ~ 3< ^‘ So ~ - 1 > s 3>1< ^' S ' 1 ~ - 0 > 

. 1<4»S 2 =2 . 4<5<S-5 3 =2 . 5=5<S-S 4 =1 . 5<6«4S 5 =2 . 6>5<»S 6 =0 . 5=5<S-5 7 =1 

7 > l > l > l > l > 7 > 

J c— 1 c—2 J 

■ 5<6<S-S 8 =2 . 6>3<S-5 9 =0 . $ t „ , . ... . ., ,. 

j * i > j — > nir ff = 4 - Each transition gives the corresponding 

condition and, eventually, the value of the counter c just after firing that transition. 



3 Filtering Algorithm 

The filtering algorithm is based on the following idea. For a given global constraint C, 
one can think of its automaton as a procedure that repeatedly maps a current state Qi and 
counter vector A%, given a signature variable £> 7 , to a new state Qi+i and counter vector 
Ki+ 1 , until a terminal state is reached. We then convert this procedure into a transition 
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constraint <Pc(Qi, Ki, Si, Qi+i, K i+ 1 ) as follows. Qi is a variable whose values cor- 
respond to the states that can be reached at step i. Similarly, K t is a vector of variables 
whose values correspond to the potential values of the counters at step i. Assuming that 
the automaton associated with C has na arcs arc(gi, si, q[, fi(K)), . . . , arc (q na , s n a, 
q'na ■ fna{K)), the transition constraint has the following form, implemented in SICStus 
Prolog 1 with arithmetic, case 2 , and element constraints [13]: 

( (Qi = 9i) A (Si = Si) A (Qi+i = q[) A (Ki +1 = fi(Ki)) 

v : , _ __ 

l (Qi = qn.) A (Si = Sna ) A (Q i+ 1 = O A (K i+ 1 = f„a(Ki)) 

We can then arrive at a filtering algorithm for C by decomposing it into a conjunction 
of <P C constraints, “threading” the state and counter variables through the conjunction. 
In addition to this, we need the signature constraints &c{Si, Ai) (0 < i < m) that relate 
each signature variables Si to the variables of its corresponding signature argument 
Ai. Filtering for the constraint C is provided by the conjunction of all signature and 
transitions constraints, (s being the start state and t being the end state): 

't r c(So,Ao) A i'cis, K 0 , So, Qi, K i) A 

&c(Sl,Ai) A <P c (Qi,Ki,S 1 ,Q 2 ,K2) A 

#c(S m - 1,1m — l) A •I>c(Qm-l,K m -l,S m - 1 ,Q m ,Km) A 





Fig. 2. Automata and decision trees for (A) <i ex and (B) among. 



A couple of examples will help clarify this idea. Note that the decision tree needs 
to correctly handle the case when the terminal state has already been reached. 

1 In ECLiPSe one could typically use Propia [27], a library supporting generalized propagation, 
for encoding the transition constraint. 

2 When no counter is used we only need a single case constraint to encode the disjunction 
expressed by the transition constraint. This disjunction is expressed as a decision tree as illus- 
trated by Fig. 2. Since the case constraint [13, page 463] achieves arc-consistency, it follows 
that, in this context, the transition constraint achieves arc-consistency. 
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Example 2. Consider a x<\ ex y constraint over vectors of length n. First, we need 
a signature constraint t?< lex relating each pair of arguments a;[z],j/[i] to a signature 
variable Si. This can be done as follows: ^< lex (Si, x[i\, y[i}) = (x[z] < y[i\ 4=> Si = 
1) A (x[z] = y[i] <f=> Si = 2) A (x[i] > y[i ] 4=> Si = 3). The automaton of <i ex and 
the decision tree corresponding to the transition constraint <P< lex are shown in part (A) 
of Fig. 2. 

Example 3. Consider a among (nvar , vars , values) constraint. First, we need a sig- 
nature constraint '/Among relating each argument vars[i\ to a signature letter S, . This 
can be done as follows: tPamongOSj, vars [z], values) = (vars[i\ £ values <=> Si = 
1) A (vars[i\ values <f=> Si = 0). The automaton of among and the decision tree 
corresponding to the transition constraint ^ a mong are shown in part (B) of Fig. 2. 

Consistency. We consider automata where all subsets of variables in Signature Arg are 
pairwise disjoint, and that do not involve counters. Many constraints can be encoded by 
such automata, for instance the global_contiguity and lex_lesseq constraints 
presented in Fig. 1. For this kind of automata the filtering algorithm achieves arc- 
consistency, provided that the filtering algorithms of signature and transition constraints 
achieve also arc-consistency. To prove this property, consider the constraint hypergraph 
that represents the conjunction of all signature and transition constraints (see Fig. 3). It 
has two particular properties: there is no cycle in the corresponding intersection graph', 
and for any pair of constraints the two sets of involved variables share at most one 
variable. Such an hypergraph is so-called Berge-acyclic [9]. Berge-acyclic constraint 




Fig. 3. Constraint hypergraph of the conjunction of transition and signature constraints in the case 
of disjoint SignatureArg sets. The i-th SignatureArg set Z\, is denoted by { Xi , Yi, . . .}. 



networks were proved to be solvable polynomially by achieving arc-consistency [20, 
21]. Therefore, if all signature and transition constraints achieve arc-consistency then 
we obtain a complete filtering for our global constraint. 

Performance. It is reasonable to ask the question whether the filtering algorithm de- 
scribed herein performs anywhere near the performance delivered by a hard-coded im- 
plementation of a given constraint. To this end, we have compared a version of the 
Balanced Incomplete Block Design problem [18, prob028] that uses a built-in <i ex 
constraint to break column symmetries with a version using our filtering based on a 

3 In this graph each vertex corresponds to a constraint and there is an edge between two vertices 
iff the sets of variables involved in the two corresponding constraints intersect. 
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Table 1 . Time in milliseconds for finding (A) the first solution of B1BD instances using built-in 
vs. simulated <i ex (BCS denotes time spent for breaking column symmetries: with respect to 
the first column, BCS corresponds to the time spent in the built-in <i ex constraint), and (B) all 
solutions to a single built-in vs. simulated <i ex constraint. 



(A) 



Problem 

v, b, r, k, X 


Built-in <i ex 

BCS/Other 


Simulated <i ex 

BCS/Other 


6, 50, 25, 3, 10 


70/170 


250/170 


6,60,30,3,12 


120/110 


50/110 


8, 14, 7,4,3 


10/80 


50/80 


9,120,40,3, 10 


480/1090 


440/1090 


10,90,27,3,6 


550/90 


1010/90 


10,120, 36,3,8 


1400/2070 


1040/2070 


12,88,22,3,4 


450/970 


530/970 


13, 104, 24,3,4 


540/1230 


540/1230 


15,70, 14,3,2 


220/910 


520/910 



(B) 



Xi € [0, m — 1], yi = m — i 
I *1 = [y\=m 





Built-in <i ex 


Simulated <i ex 


El 


10 


20 


a 


110 


170 


a 


1640 


2300 


a 


29530 


39100 



finite automaton for the same constraint. In a second experiment, we measured the time 
to find all solutions to a single <i ex constraint. The experiments were run in SICStus 
Prolog 3.11 on a 600MHz Pentium III. The results are shown in Table 1. 



4 Applications of This Technique 

Designing Filtering Algorithm for Global Constraints. We apply this new method- 
ology for designing filtering algorithms for the following fairly large set of global con- 
straints. We came up with an automaton 4 for the following constraints: 1. Unary con- 
straints specifying a domain like in [14] or not_in [16]. 2 . Channeling constraints 
like domain_constraint [28]. 3. Counting constraints for constraining the number 
of occurrences of a given set of values like among [6], at least [16], atmost [16] 
or count [14]. 4 . Sliding sequence constraints like change [4], longest_change 
or smooth [2], longest_chang e(size, vars, ctr) restricts the variable size to the 
maximum number of consecutive variables of vars for which the binary constraint ctr 
holds. 5. Variations around the element constraint [19] like element_greatereq 
[24], element_lesseq [24] or element.sparse [16]. 6. Variations around the 
maximum constraint [3] like max_ind ex(vars , index) . max.index enforces the 
variable index to be equal to one of the positions of variables corresponding to the max- 
imum value of the variables of vars. 7 . Constraints on words like global_contigui- 
ty [22], group [16], group_skip_i sola ted_i tern [2] or pattern [1 1]. 8. Con- 
straints between vectors of variables like between [12], <i ex [17], lex_dif f erent 
or dif f er_f rom_at_least_k_pos. Given two vectors x and y which have the 
same number of components, the constraints lex_dif f erent( x , y ) and dif f er- 
_f rom_at_least_k_pos(/c, x , y ) respectively enforce the vectors x and y to dif- 
fer from at least 1 and k components. 9 . Constraints between n-dimensional boxes like 

4 These automata are available in the technical report [5]. All signature constraints are encoded 
in order to achieve arc-consistency. 
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two_quad_are_in_contact [16] or two_quad_do_not_overlap [7], 10 . Con- 
straints on the shape of a sequence of variables like inflection [2], top [8] or 
valley [8]. 11 . Various constraints like in_same_partition(mri, var 2 , 
partitions), not_all_equal(iws) or sliding_card_skipO (atleast, almost, 
vars, values). in_same_partition enforces variables var i and var 2 to be respec- 
tively assigned to two values that both belong to a same sublist of values of partitions. 
not_all_equal enforces the variables of vars to take more than a single value. 
sliding_card_skipO enforces that each maximum non-zero subsequence of con- 
secutive variables of vars contains at least atleast and almost values from the set of 
values values. 



Filtering Algorithm for a Conjunction of Global Constraints. Another typical use 
of our new methodology is to come up with a filtering algorithm for the conjunction 
of several global constraints. This is usually not easy since this implies analyzing a 
lot of special cases showing up from the interaction of the different considered con- 
straints. We illustrate this point on the conjunction of the betweenfa , ~x , b) [12] 
and the exactly_one( x , values) constraints for which we come with a filtering al- 
gorithm, which maintains arc-consistency. The between constraint holds iff a <i ex x 
and ~x <i ex b , while the exact ly_one constraint holds if exactly one component of 
x takes its value in the set of values values. 

The left-hand part of Fig. 4 depicts the two automata A and A respectively as- 
sociated with the between and the exactly_one constraints, while the right-hand 
part gives the automaton A associated with the conjunction of these two constraints. 
A corresponds to the product of Ai and A - States of A> are labeled by the two states 
of A and A they were issued. Transitions of A 3 are labeled by the end symbol $ or 
by a conjunction of elementary conditions, where each condition is taken in one of the 
following set of conditions {ai < Xi,ai = Xi,ai > Xi}, {A > Xi, 6; = Xj A < x,}, 
{xi £ values, x, £ values}. This makes up to 3 • 3 • 2 = 18 possible combinations and 
leads to the signature constraint 'f'betweenAexactiy.on e {Si,ai,Xi,bi, values) between 
the signature variable Sj and the i-th component of vectors A, ~x and b : 



Si = 



0 if ai < Xi A bi > Xi A Xi (jL values, 

1 if ai < Xi A bi — Xi A Xi (jL values, 

2 if ai < Xi A bi < Xi A Xi (jL values, 

3 if ai = Xi A bi > xi A Xi (jL values, 

4 if ai = Xi A bi = Xi A Xi (jL values, 

5 if ai = Xi A bi < Xi A Xi (jL values, 

6 if ai > Xi A bi > Xi A Xi 0 values, 

7 if ai > Xi A bi — Xi A Xi (jL values, 

8 if ai > Xi A bi < Xi A Xi (jL values. 



9 if < Xi A bi > Xi A Xi E values , 

10 if ai < Xi A bi = Xi A Xi E values, 

11 if ai < Xi A bi < Xi A Xi E values, 

12 if ai = Xi A bi > Xi A Xi € values, 

13 if ai = Xi A bi — Xi A Xi E values, 

14 if ai = Xi A bi < Xi A Xi E values, 

15 if ai > Xi A bi > Xi A Xi E values, 

16 if ai Xi Abi — Xi A Xi E values, 

17 if ai > Xi A bi <. Xi A Xi E values. 



In order to achieve arc-consistency on the conjunction of the between(A, ~x , b ) 
and the exactly_one("a?, values) constraints we need to have arc-consistency on 
^betweenAexactiy.on e {Si, di, Xi,bi, values) . In our context this is done by using the 
global constraint programming facilities of SICStus Prolog [14] 5 . 



5 The corresponding code is available in the technical report [5], 
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Fig. 4. Automata associated with between and exactly.one and the automaton associated 
with their conjunction. 



Example 4. Consider three variables x £ {0, 1}, y £ {0, 3}, z £ {0, 1, 2, 3} subject to 
the conjunction of constraints between((0, 3, 1), (x, y, z ), (1, 0, 2)) A exactly_one 
((x, y, z), {0}). Even if both the between and the exactly_one constraints achieve 
arc-consistency, we need the automaton associated with their conjunction to find out 
that z ^ 0. This can be seen as follows: after two transitions, the automaton _4 :J will be 
either in state ai or in state bi. However, in either state, a 0 must already have been 
seen, and so there is no support for z = 0. 



5 Handling Relaxation for a Counter-Free Automaton 

This section presents a filtering algorithm for handling constraint relaxation under the 
hypothesis that we don’t use any counter in our automaton. It can be seen as a general- 
ization of the algorithm used for the regular constraint [25]. 

Definition 1. The violation cost of a global constraint is the minimum number of sub- 
sets of its signature argument for which it is necessary to change at least one variable 
in order to get back to a solution. 

When these subsets form a partition over the variables of the constraint and when they 
consist of a single element, this cost is in fact the minimum number of variables to 
unassign in order to get back to a solution. As in [26], we add a cost variable cost as 
an extra argument of the constraint. Our filtering algorithm first evaluates the minimum 
cost value A din. Then, according to max(cosf), it prunes values that cannot belong to 
a solution. 

Example 5. Consider the constraint global_contiguity([Vo, Vi, V 2 , V3, V 4 , 
with the following current domains for variables Vp [{0, 1}, {1}, {1}, {0}, 
{1}, {0, 1}, {1}]. The constraint is violated because there are necessarily at least two 
distinct sequences of consecutive 1. To get back to a state that can lead to a solution, 
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it is enough to turn the fourth value to 1. One can deduce Min = 1. Consider now 
the relaxed form sof t_global_contiguity([Vo, V\, V2, V3, V4, Vs, Vs], cost) 
and assume max( cost) = 1. The filtering algorithm should remove value 0 from 
V5. Indeed, selecting value 0 for variable V5 entails a minimum violation cost of 2. 
Observe that for this constraint the signature variables So, Si, S2, S3, S4, S5, S& are 
Vo, Vi, V2, V3, V4, V5, Vg. 

As in the algorithm of Pesant [25], our consistency algorithm builds a layered 
acyclic directed multigraph Q. Each layer of Q contains a different node for each state of 
our automaton. Arcs only appear between consecutive layers. Given two nodes n\ and 
tt- 2 of two consecutive layers, q\ and (/ 2 denote their respective associated state. There is 
an arc a from n\ to 712 iff, in the automaton, there is an arc arc(gi, v, (72) from qi to 92- 
The arc a is labeled with the value v. Arcs corresponding to transitions that cannot be 
triggered according to the current domain of the signature variables So, ■■■ , S m - 1 are 
marked as infeasible. All other arcs are marked as feasible. Finally, we discard isolated 
nodes from our layered multigraph. Since our automaton has a single initial state and a 
single final state, Q has one source and one sink, denoted by source and sink. 

Example 5 continued. Part (A) of Fig. 5 recalls the automaton of the global_conti - 
guity constraint, while part (B) gives the multigraph Q associated with the sof t_ 
global_contiguity constraint previously introduced. Each arc is labeled by the 
condition associated to its corresponding transition. Each node contains the name of 
the corresponding automaton state. Numbers in a node will be explained later on. In- 
feasible arcs are represented with a dotted line. 




global_contiguity global_contiguity according to S 0 £-| $2 $3 £4 $5 5 6 



Fig. 5. Relaxing the global_contiguity constraint. 



We now explain how to use the multigraph Q to evaluate the minimum violation cost 
M in and to prune the signature variables according to the maximum allowed violation 
cost max( cost). Evaluating the minimum violation cost Min can be seen as finding 
the path from the source to the sink of Q that contains the smallest number of infeasible 
arcs. This can be done by performing a topological sort starting from the source of Q. 
While performing the topological sort, we compute for each node /),/,. of Q the mini- 
mum number of infeasible arcs from the source of Q to n k . This number is recorded 
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in before [rife] . At the end of the topological sort, the minimum violation cost Mi in we 
search for, is equal to before [sink] . 

Notation 1 Let i be assignable to a signature variable <S). Miin] denotes the minimum 
violation cost value according to the hypothesis that we assign i to Si. 

To prune domains of signature variables we need to compute the quantity Miin). In 
order to do so, we introduce the quantity after[nk ] for a node rife of Q\ after [nk] is 
the minimum number of infeasible arcs on all paths from rife to sink. It is computed 
by performing a second topological sort starting from the sink of Q. Let A\ denote 
the set of arcs of Q, labeled by i. for which the origin has a rank of l. The quantity 
min ( before[a] + after [b]) represents the minimum violation cost under the hypoth- 

a— 

esis that Si remains assigned to i. If that quantity is greater than Ml in then there is no 
path from source to sink which uses an arc of A] and which has a number of infeasible 
arcs equal to Ml in. In that case the smallest cost we can achieve is Mi in + 1. Therefore 
we have: 

Miin ) = min( min fbefore[a] + after[b \) , Min + 1) 

a— 

The filtering algorithm is then based on the following theorem: 

Theorem 1. Let i be a value from the domain of a signature variable Si. If Miin] > 
max(cosf) then i can be removed from Si. 

The cost of the filtering algorithm is dominated by the two topological sorts. They have 
a cost proportional to the number of arcs of Q which is bounded by the number of sig- 
nature variables times the number of arcs of the automaton. 

Example 5 continued. Let us come back to the instance of Fig. 5. Beside the state’s 
name, each node rife of part (B) of Fig. 5 gives the values of before [nk] and of after [nf\ ■ 
Since before [sink] =lwe have that the minimum cost violation is equal to 1 . Pruning 
can be potentially done only for signature variables having more than one value. In our 
example this corresponds to variables Vo and V 5 . So we evaluate the four quantities 
MUtiq = min(0 + 1,2) = 1, Miin J = min(0 + 1,2) = 1, Alin® = min(min(3 + 
0, 1 + 1, 1 + 1), 2) = 2, Miin\ = min(min(3 + 0, 1 + 0), 2) = 1. If max(cosf) is equal 
to 1 we can remove value 0 from V 5 . The corresponding arcs are depicted with a thick 
line in Fig. 5. 

6 Conclusion and Perspectives 

The automaton description introduced in this article can be seen as a restricted program- 
ming language. This language is used for writing down a constraint checker, which ver- 
ifies whether a ground instance of a constraint is satisfied or not. This checker allows 
pruning the variables of a non-ground instance of a constraint by simulating all poten- 
tial executions of the corresponding program according to the current domain of the 
variables of the relevant constraint. This simulation is achieved by encoding all poten- 
tial executions of the automaton as a conjunction of signature and transition constraints 
and by letting the usual constraint propagation deducing all the relevant information. 
We want to stress the key points and the different perspectives of this approach: 
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- Within the context of global constraints, it was implicitly assumed that providing a 
constraint checker is a much easier task than coming up with a filtering algorithm. 
It was also commonly admitted that the design of filtering algorithms is a difficult 
task which involves creativity and which cannot be automatized. We have shown 
that this is not the case any more if one can afford to provide a constraint checker. 

- Non-determinism has played a key role by augmenting programming languages 
with backtracking facilities [15], which was the origin of logic programming. Non- 
determinism also has a key role to play in the systematic design of filtering algo- 
rithms: finding a filtering algorithm can be seen as the task of executing in a non- 
deterministic way the deterministic program corresponding to a constraint checker 
and to extract the relevant information which for sure occurs under any circum- 
stances. This can indeed be achieved by using constraint programming. 

- A natural continuation would be to extend the automaton description in order to get 
closer to a classical imperative programming language. This would allow reusing 
directly available checkers in order to systematically get a filtering algorithm. 

- Other structural conditions on the signature and transition constraints could be iden- 
tified to guarantee arc-consistency for the original global constraint. 

- An extension of our approach may give a systematic way to get an algorithm (not 
necessarily polynomial) for decision problems for which one can provide a poly- 
nomial certificate. From [30] the decision version of every problem in NP can be 
formulated as follows: Given x, decide whether there exists y so that \y\ < m(x) 
and R(x , y). £ is an instance of the problem; y is a short YES-certificate for this 
instance; R(x. y) is a polynomial time decidable relation that verifies certificate y 
for instance x; and m(x) is a computable and polynomially bounded complexity 
parameter that bounds the length of the certificate y. In our context, if \y\ is fixed 
and known, a: is a global constraint and its \y\ variables with their domains; y is a 
solution to that global constraint; R(x, y) is an automaton which encodes a checker 
for that global constraint. 
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Abstract. Constraint programming is rapidly becoming the technology of choice 
for modeling and solving complex combinatorial problems. However, users of 
constraint programming technology need significant expertise in order to model 
their problem appropriately. The lack of availability of such expertise can be a 
significant bottleneck to the broader uptake of constraint technology in the real 
world. In this paper we are concerned with automating the formulation of con- 
straint satisfaction problems from examples of solutions and non-solutions. We 
combine techniques from the fields of machine learning and constraint program- 
ming. In particular we present a portfolio of approaches to exploiting the seman- 
tics of the constraints that we acquire to improve the efficiency of the acquisition 
process. We demonstrate how inference and search can be used to extract useful 
information that would otherwise be hidden in the set of examples from which 
we learn the target constraint satisfaction problem. We demonstrate the utility of 
the approaches in a case-study domain. 



1 Introduction 

Constraint programming is rapidly becoming the technology of choice for modelling 
and solving complex combinatorial problems. However, users of constraint program- 
ming technology need significant expertise in order to model their problem appro- 
priately. The ability to assist users to model a problem in the constraint satisfaction 
paradigm is of crucial importance in making constraint programming accessible to non- 
experts. However, there are many obstacles which must be overcome. For example, in 
some situations users are not capable of fully articulating the set of constraints they wish 
to model. Instead users can only present us with example solutions and non-solutions of 
the target constraint satisfaction problem (CSP) they wish to articulate. This situation 
arises in many real-world scenarios. In purchasing, a human customer may not be able 
to provide the sales agent with a precise specification of his set of constraints because 
he is unfamiliar with the technical terms that are required to specify each constraint. 
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Alternatively, in a data-mining context we may have access to a large source of data 
in the form of positive and negative examples, and we have been set the task of gen- 
erating a declarative specification of that data. Earlier work in this area has focused 
on the generalization problem, inspired by work from the field of Inductive Logic Pro- 
gramming [8]. Here we focus on combining techniques from constraint processing and 
machine learning to develop a novel approach to constraint acquisition. 

We have proposed an algorithm, CONACQ, that is capable of acquiring a model of a 
CSP from a set of examples [2]. The algorithm is based on version space learning [6]. 
Version spaces are a standard machine learning approach to concept learning. A version 
space can be regarded as a set of hypotheses for a concept that correctly classify the 
training data received; in Section 2 we shall present an example which will serve both 
a pedagogical role and demonstrate the problem we address in this paper. 

However, the CONACQ algorithm suffers from a serious malady that has signifi- 
cant consequences for its ability to acquire constraint networks efficiently. In particular, 
this malady arises because we are acquiring networks of constraints, some of which 
may be redundant [1,3,10]. Informally, for now, we can regard a constraint as being 
redundant if it can be removed from a constraint network without affecting the set of so- 
lutions. While redundant constraints have no effect on the set of solutions to a CSP, they 
can have a negative effect on the acquisition process. In particular, when using version 
spaces to represent the set of consistent hypotheses for each constraint, redundancy can 
prevent us from converging on the most specific hypotheses for the target network, even 
though the set of training examples is sufficient for this to occur. As a consequence, for 
a given constraint in the network, its version space may not be sufficiently explicit, but 
rather contain constraints that are far too general. This is a significant problem since the 
size of each version space has a multiplicative effect on the number of possible CSPs 
that correctly classify the training examples. Furthermore, not having the most explicit 
constraints everywhere in the network can have a negative effect on the performance of 
some constraint propagation algorithms. 

In this paper we present a portfolio of approaches to handling redundant constraints 
in constraint acquisition. In particular, we address the issue of how to make each con- 
straint as explicit as possible based on the examples given. We shall present an approach 
based on the notion of redundancy rules, which can be regarded as a special-case of re- 
lational consistency [4], We shall show that these rules can deal with some, but not all, 
forms of redundancy. We shall then present a second approach, based on the notion of 
backbone detection [7], which is far more powerful. 

The remainder of this paper is organized as follows. Section 2 presents a simple 
example of how acquiring redundant constraints can have an adverse effect on the con- 
straint acquisition process. Section 3 presents some formal definitions of the concepts 
that underpin our approach. We formalize the notion of redundancy in constraint net- 
works, and show how the problem identified in Section 2 can be easily addressed. 
Section 4 presents a more powerful approach to dealing with redundancy due to dis- 
junctions of constraints. Section 5 presents an empirical evaluation of the various ap- 
proaches presented in the paper and presents a detailed discussion of our results. A 
number of concluding remarks are made in Section 6. 
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2 An Illustrative Example 



We demonstrate the potential problems that can arise due to redundancy during an inter- 
active acquisition session using CONACQ in an example scenario. Consider the hypoth- 
esis space of constraints presented in Figure 1(a). We assume in our example that all 
constraints in our target problem can be expressed using this hypothesis space. These 
constraints are arranged in a lattice, e.g. < includes both < and = and so appears above 
< and =, such that more general constraints are placed higher in the hypothesis space. 
The constraint T is the universal constraint - all tuples are accepted. The constraint _L 
is the null constraint - no tuples are accepted. Positive examples (solutions) and neg- 
ative examples (non-solutions) enable us to prune away portions of these spaces until, 
ideally, we have completely identified the user’s problem by reducing the version space 
for each constraint to a single element. The term version space refers to the subset of 
the hypothesis space that remains as we prune away possibilities after each example is 
considered. The distinction will become clear as we consider an example below. 

The CONACQ algorithm maintains a separate version space for each potential con- 
straint in the CSP. A solution to the target CSP (positive example) provides examples 
for each constraint in the problem, since all constraints must be satisfied in order for 
an example to be classified as positive. However, negative examples are more problem- 
atic to process, since violating at least one constraint is sufficient for an example to be 
classified as negative. Therefore, a negative example provides a disjunction of possible 
examples. It is only when the algorithm can deduce which constraints must have been 
violated by the example, causing it to be classified as negative, that the appropriate 
version spaces can be updated. 

For the purposes of this example, we wish to acquire a CSP involving 3 variables, 
Xi, X 2 and £ 3 , with domains D(x 1 ) = D(x 2 ) = D(xs) — {1, 2, 3, 4}. The set of con- 
straints in the target network is {xi > X2, X\ > £ 3 , £2 > £ 3 } • In Table 1 the set of 
examples that will be provided to the acquisition system is presented. The set of ex- 
amples comprises one positive example (a solution to the target CSP) and two negative 
examples (non- solutions). Figures l(b)-l(d) illustrate the effect of each example in turn 
on the version spaces of the constraints in the network. 

Figure 1 (b) presents the state of each of the constraint version 
spaces after the first (and only) positive example, has been pro- 
cessed. We can see that the version space of each constraint now 
contains four hypotheses: >,^,> and T. The other hypotheses 
are eliminated because they are inconsistent with ef . Specifically, 



Table 1 . Examples 
for Fig. 1. 
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£3 
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e 2 


2 
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e 3 
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2 



if £1 = 4 A £2 = 3 can be part of a solution, then the constraint 
between these variables must be more general than or equal to >. 

Therefore, we can ignore the possibility that this constraint can be 
either =, <, < or _L. Essentially, we know that any CSP that can be expressed in terms 
of the constraints presented in Figure 1(a) must comprise constraints that are no more 
specific than those in the version spaces presented in Figure 1(b). Similar reasoning al- 
lows us to reduce the version space for each constraint to that illustrated in Figure 1(b). 

Figure 1(c) presents the effect of processing example e%, the first negative example. 
Of the three constraints in the problem, e 2 differs by only one constraint, C 12 , compared 
to the constraints implied by the the positive example ef . Therefore, we can further 
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(a) Hypothesis space of 
constraints in the toolkit 




(b) Step 1 : After processing the 
positive example e^. 





(c) Step 2 : After processing the 
negative example e ^ . 





(d) Step 3 : After processing the 
negative example e J . 



Fig. 1 . Acquiring a redundant constraint prevents one version space from converging. 



refine the version space of constraint C12 by removing both / and T. We illustrate this 
by using a colored shading over those hypotheses that are removed from the version 
space. Similarly, the reason why eiT is classified as negative is due to a single constraint: 
namely C23. Figure 1 (d) illustrates the result of processing negative example eiT . 

After processing the negative examples e^~ , , the version spaces for the constraint 

between variables x\ and x-2 and between variables .x'2 and X3 are reduced to the set of 
hypotheses {>,>}. However, the version space for the constraint between variables X\ 
and £’3 has not. Instead, this version space contains four possible hypotheses: { >,>, 

AT}. 

This is unfortunate since we cannot now find a set of negative examples which will 
help this version space to reduce any closer to the target constraint. For example, to 
eliminate the hypothesis /, we need a negative example with X\ < X3 but necessarily 
satisfying all other acquired constraints, i.e., satisfying their most specific possible al- 
ternative: x\ > .x‘2 and X2 > X3, so that the only possible reason to reject it is x\ < X3. 
Clearly no such example exists. As a consequence, the version spaces in this scenario 
cannot converge any further. However, it should be pointed out that it was not due to a 
deficiency in our set of examples, as we shall see in Example 2 later in this paper, that 
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precluded convergence in this case, but as a result of attempting to acquire redundant 
constraints using the CONACQ algorithm. Specifically, in our example the constraint 
between x\ and X 3 is redundant. 

Therefore, it is clear that acquiring redundant constraints can prevent us from con- 
verging on the most specific hypotheses consistent with a set of examples. However, 
by exploiting the fact that we are acquiring constraint networks, we can rely on various 
search and inference techniques to help us leverage the learning power of the examples 
that have been provided to us. In the ideal, we can exploit redundancy to help us to 
converge on the target hypothesis much more quickly. In the next section we present an 
approach exploiting redundant constraints that overcomes the problem we have experi- 
enced in this example. 

3 Redundancy Rules 

In this section, we introduce formal definitions of the basic concepts used in this paper. 
We then propose definitions of redundancy and redundancy rules, before presenting an 
approach to dealing with redundant constraints in the CONACQ acquisition process. 

3.1 Basic Definitions 

A finite constraint network N consists of a finite set of variables X = {aq, . . . , x n }, 
a set of domains D = {D( X\), , D(x n )}, where the domain D{xf) is the finite set 
of values that variable Xi can take, and a set of constraints C = {ci, . . . , c m }. Each 
constraint Ci is defined by the ordered set var(c , ) of the variables it involves, and a 
set sol(ci) of allowed combinations of values. An assignment of values to the variables 
in var{ci ) satisfies Ci if it belongs to sol(ci). A solution to a constraint network is an 
assignment of a value from its domain to each variable such that every constraint in 
the network is satisfied. When all the constraints in C involve exactly 2 variables, we 
say that the constraints and the network are binary. This is the case we will study in 
the rest of the paper since it greatly simplifies notation. We will use c(xj, Xj) and cy 
interchangeably to refer to sol(c) where var(c) = (, Xi,Xj ). However, all the results are 
essentially the same for constraints of any arity. 

As seen in the previous section, redundancy is a crucial notion that we need to tackle 
if we want to speed up version space convergence during the constraint acquisition 
process. 

Definition 1 (Redundancy) Given a constraint network N = (X, I). C), we say that 
a constraint c £ C is redundant wrt N iff the set of solutions of N is the same as the 
set of solutions of iV_ c = ( X , D,C\ {c}). We note 7V_ C |= c. 

3.2 Redundancy in CONACQ 

The CONACQ algorithm has been proposed in [2]. Its inputs are a set X of variables 
with their domains, a set of examples E — E + U E~ , and a bias B. An example e £ E 
is an assignment of values to variables from X that must be a solution of the target 
constraint network (if e £ E + ) or non solution (if e £ E~). 
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The bias is composed of constraint scopes (sets of variables on which a constraint 
c has to be guessed), associated with a set of constraint types that are the different 
possibilities for sol(c). In the simplest case, where we guess a complete network of 
binary constraints, the bias contains all pairs of variables from X as possible scopes, 
associated with all the binary constraint types available in the toolkit. The set of possible 
constraints on (xy Xj) is denoted by its bias, Bij. 

The output of CONACQ is any constraint network that has the same set X of vari- 
ables with their domains, and a set of constraints chosen from the bias such that every 
element of E + is solution and none from E~ . Since the number of constraint net- 
works satisfying these criteria during the acquisition process can be huge (exponential), 
CONACQ uses version space techniques and maintains only a most specific bound Sjj 
and a most general bound Gij for each pair of variables (x, , x :j ) belonging to the bias. 
Any constraint in the toolkit subsumed by Gij and subsuming <Sy is a candidate for c, 3 
(namely, belongs to the version space). 

Theorem 1 . Let X. I), B, E be the input of CONACQ . Let Cij £ B^. If there ex- 
ists {cik,Ckj} £ Bik x Bkj such that E |= {dk,Ckj} and is redundant wrt 
(X,D,{dk,Ckj}) then the version space cannot shrink its bounds on ( Xi,Xj ) more 
than Sij = Cij and Gij = T. 

Proof Let cL £ li, 3 a constraint subsumed by c^. Suppose there exists e £ E~ such 
that e violates Cy . (This is the only way to remove Cy from the version space.) We can 
decrease the local general bound Gij under Cy only if no other constraint in the version 
space can reject e. Now, we know that E |= { c,j - , c/y }. Hence, when e is presented, we 
are guaranteed that c,k and Ckj are still higher than their respective lower bounds Sik 
and Skj (otherwise E would cause some version spaces to collapse, and we could infer 
what we want on Sy and Gij). If e violates Cy , it also violates cy since Cy is subsumed 
by c^. It thus violates {cik,Ckj} since c^ is redundant wrt (X, D, {c,fc, Ckj}). As a 
result, we cannot decide that Cy is the necessary culprit for e’s rejection since there 
exist constraints between Sik and dk, and between Skj and Ckj, which are both in the 
version space, and could reject e. So, Gy cannot decrease under T. 

Regarding Sy , it will increase higher than cy if and only if there exists e £ E + 
that violates cy . However, if e violates cy, it also violates {dk, Ckj} (see above), which 
contradicts the assumption that E \= {dk, Ckj}- □ 

3.3 Formal Definition of Redundancy Rules 

A constraint in a constraint network can be seen as a constraint type (or first order 
predicate) in which we substitute network variables for variables in the predicate. For 
example, the generic predicate P(s,t) = ‘s < t’ of arity n(P) — 2 can produce 
the constraint Xi < X2 in a constraint network involving xi and X2, or the constraint 
y 3 < ?/5 in another constraint network. 

Since the process of modeling a problem is usually done using a given constraint 
toolkit, it seems reasonable to study the concept of redundancy with respect to the set 
of constraint types available in that toolkit. Let us first define the concept of redundancy 
rule for general constraint types. 
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Definition 2 (Redundancy rule) Let T be a set of constraint types. The Horn clause 
Vti , . . ,t n j\y Pi(ti 1 . . |= Q(tj 1 . . . tj n 

i 

with Pi C T \/i, and Q G T, is a redundancy rule wrt T iff there is at least one 
variable tj h in Q that appears in some Pi, and for any constraint network N for which 
a substitution 1 9 maps the rule into N, we have 

N-e iQ) b 0(Q). 

If\{Pi}\ = k, we say that the rule is a k-redundancy rule. 

We immediately focus our attention on redundancy rules in a binary constraints 
setting where, if in addition we work on a complete network of binary constraints, it is 
sufficient to deal with 2-redundancy rules [5]. 

Definition 3 (Binary redundancy rule) Let T be a set of constraint types ofarity 2. A 
binary redundancy rule is a redundancy rule wrt T of the form: 

\/tl,t2,t 3: P 1 (t 1 ,t 2 ) A P 2 (t 2 ,t 3 ) |= Q(t 1: t 3 ). 

Example 1 The Horn clause Mx,y,z.(x > y) A (y > z) b ( x — z ) ' s a binary 
redundancy rule since any constraint network in which we have two constraints *>’ 
such that the second argument of the first constraint is equal to the first argument of the 
second constraint subsumes the “>’ constraint between the first argument of the first 
constraint and the second argument of the second constraint. 

Given the set T of constraint types available in a toolkit, redundancy rules can be 
built for the toolkit independently of the problem we will acquire. Thus, redundancy 
rules can be included as part of the constraint toolkit, in much the same way as propa- 
gators are often included in constraint toolkits, at least for the most common constraints. 

3.4 Redundancy Rules in CONACQ 

We saw in Theorem 1 that it can sometimes occur that the local version space for the 
constraint between a pair of variables (xi, Xj) can reach a state where it becomes im- 
possible to make its general bound more specific (thus reducing its size) because it con- 
tains a constraint that is redundant with respect to the other constraints already learned 
by CONACQ . To avoid this problem, we can simply trigger the relevant redundancy rule 
from the toolkit each time its left-hand side is true, namely the rule becomes “ active ” 
in a version space. 

Definition 4 (Active Redundancy Rule) Given a binary rule R = P\{t\, t 2 ) A P 2 (t 2 , 
t 3 ) b Q(t 1 ,^ 3 ), a version space V, and a mapping 9 substituting variables ofV for 
variables in R, we say that R is active in V wrt 9 if Pi(9(ti), 9(t 2 )) is subsumed by 
G(9(ti), 6(t 2 )), and P 2 (9(t 2 ), 9(t 3 )) is subsumed by G(9(t 2 ), 9(t 3 )). 

1 As in most toolkits, we require that 9 is ‘locally’ injective, namely two different ti h ’s in the 
same Pi cannot map on the same network variable. 
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Definition 5 (Satisfying a Redundancy Rule) Let 6 be a mapping substituting vari- 
ables of a version space V for variables in a rule R = Piitifif) A P'lit'i-, t-f) \= 
Q(ti,t 3 ). We say that R is satisfied on V wrt 6 if Q(9(fi), $(£ 3 )) is subsumed by 
G(9(ti), 0(t 3 )). ' 

Thus, when a rule R is active with respect to a mapping 9, we can force it to be sat- 
isfied (or apply it) by modifying the general bound of the constraint on which 9 maps its 
right hand side. This modification does not affect the set of possible networks admitted 
by the version space. We state this more formally in Definition 6 and Theorem 2. 

Definition 6 (Version Space Equivalence) Let V and V' be two version spaces de- 
fined on the same variables and bias. We say that V and V' are equivalent iff for any 
constraint network TV obtained by picking a constraint between Sij and Gij for each 
(. Xi , Xj) in V there exists a constraint network N' obtained the same way from V' such 
that TV and N' have the same solutions. 

Theorem 2. Let V be a version space. Let V’ be the version space obtained after a 
rule R has been applied to V. If R was active on V, then V’ and V are equivalent. 

Proof. Suppose there exists a constraint network TV in V for which none of the con- 
straint networks in V ' have the same set of solutions. This means that the constraint 
Tij added by the rule R has decreased the general bound G’ l;j in V’ . The constraints 
allowed by G[j all reject some solution of TV (by assumption). This is necessarily due 
to rij . Thus, rij cannot be redundant wrt TV. By definition of what an active redundancy 
rule is, we deduce that R cannot be active in V, which contradicts the assumption. □ 
This property guarantees that we can safely apply all the redundancy rules that are 
active, reducing the size of the version space while its semantics is not affected. 

The complexity of applying all the binary rules in a version space is in 0(m x |£>| 2 ), 
with \B\ the number of constraint scopes in the bias and m the number of binary rules 
in the toolkit. For fc-redundancy rules this is in 0(m x \B\ k ). Applying fc-redundancy 
rules to a constraint network is a relaxation of relational fc-consistency [4]. However, 
relational fc-consistency requires space exponential in the number of variables in the re- 
dundant constraint while in our approach we only generate constraints from the toolkit, 
thus keeping constant space for each constraint. 

Example 2 We now apply the method above to example of Figure 1 . After processing 
the examples {ef ,ef ,ef}, we know that even in the loosest constraint network still 
possible, we have x\ > a; 2 and X 2 > X 3 . Therefore, the rule described in Example 1 is 
active. By applying it, we can reduce the possible constraint types between X\ and X 3 
to {>,>}• 

4 Higher-Order Redundancy 

While redundancy rules can handle a particular type of redundancy, there are cases 
where applying these rules on the version space is not sufficient to find all redundan- 
cies. Redundancy rules are well-suited to discovering constraints that are redundant 
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because of conjunctions of other constraints. However, as we shall show in Section 4.1, 
a constraint can be redundant because of a conjunction of disjunctions of constraints. 
We refer to this as higher-order redundancy. Since our redundancy rules are in the form 
of Horn clauses, they cannot tackle such redundancies. After a brief description of the 
way CONACQ stores the information about negative examples, we will show how to 
tackle these complex redundancies. 



4.1 Another Example 

In the scenario illustrated in Figure 2, we use the same set of variables and domains 
as those used in the example presented in Section 2. However, in this case the target 
network comprises the set of constraints { .x 1 1 = a; 2, x-\ = X 3 , Xi = 2:3}. Furthermore, 
in this example all negative instances differ from ef by at least two constraints (see the 
table in Figure 2). 
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X 2 


£3 
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e 2 
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e 3 
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3 




Fig. 2. None of the version spaces have converged. 



After processing the positive example ef, each version space contains four consis- 
tent hypotheses, the most specific hypothesis in each being =. The version spaces are 
depicted in Figure 2. However, each of the negative examples does not contain enough 
information to immediately eliminate any additional hypotheses from the version spaces 
for our constraints. For example, negative example e 2 may be negative because of ei- 
ther constraint C12 or C23, or indeed both. Therefore, none of the version spaces of the 
constraints in our example can be reduced further (indicated with light shading in Fig- 
ure 2, as opposed to the darker shade used earlier to depict hypotheses being removed 
from a version space). The version spaces in this example each contain 4 hypotheses 
due to the disjunction of possible reasons that would classify the negative examples 
correctly. 

Without any further information, particularly negative examples which differ from 
the positive example by one constraint, no further restrictions can be made on the ver- 
sion spaces of the constraints in our problem. Consequently, none of the version spaces 
converge. Simply applying redundancy rules also does not help. An alternative approach 
is required, which will be presented next. 
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4.2 Storing Negative Examples in CONACQ 

As briefly described above, when a negative example e" is presented to CONACQ it 
is encoded as a clause cl e = l V ... V where U^ is the set of most general 
constraint types available for c,j that reject e (i.e., that are violated by e~). The literal 

is true if any possible constraint type for c , 3 in its local version space is at least as 
specific as the given bound U t3 . This is the case if any constraint type r in the general 
bound Gij of c , 3 is at least as specific as a constraint type in U l3 . This is the condition 
for Cij to reject e . Hence, the clause cl e means that at least one of the constraints dj 
having a literal in cl e has to be at least as specific as its Uij to reject e~ . 

We should point out that a clause does not necessarily contain a literal for each 
constraint we have to find in the bias. Each constraint dj for which the specific bound 
Sij is already more general than Uij will not reject e . It is then useless to put a literal 
for it in the clause since this literal will be forced to be false. For example, if ejT = 
{x\ — 1 ; X2 = 1 ; X3 = 3 } and S12 = {>}, C12 cannot reject ejT. In addition, not all 
elements of E~ have a stored clause in CONACQ. It can indeed appear that an example 
is already definitively rejected by some constraint in the version space. For example, 
take again the ejT above and imagine G 23 = {>}• ejT cannot satisfy C23. Hence, it is 
useless to add a clause in CONACQ to express that ej7 should be rejected. 

The set of all the clauses containing the necessary information about E~ is denoted 
by 1C. Since a constraint network assigns a single constraint c , 3 to each pair of variables 
(. Xi,Xj ), it leads to an interpretation for every literal in 1C. By construction, it is 
guaranteed that for any constraint network leading to a satisfying interpretation for /C, 
all e~ £ E are non-solutions. (See [2] for more details.) 

4.3 Finding Higher-Order Redundancies 

In the example in Section 4.1 we have seen a case where a constraint is implied by 
the set of negative examples received by CONACQ but redundancy rules are not able 
to detect this by themselves. However, all the information necessary to deduce this 
constraint is contained in the set of redundancy rules and the set /C of clauses encoding 
the negative examples. The reason for their inability to detect it is that rules are in the 
form of Horn clauses that we apply only when all predicates in the left-hand side are 
true (i.e., we apply unit propagation on these clauses). To tackle this issue we can build 
the set 7 Z of all possible substitutions on the given bias for available rules. For each rule 
R — P\{tiU 2 ) A P 2 {t 2 ,t?i) \= Q(ii> £ 3 )* for each substitution 9 that maps Pi’s and Q 
on possible constraints in the bias, a clause -^ tl)i0(t2) V - V (2 tl) e(ta) is 
added to the set 7Z. This process can be done as soon as the bias is given, before the 
beginning of the acquisition process. 

In addition, since the semantics of a literal lP is: ‘ c l3 is at least as specific as U\ 
we need also to link literals involving the same constraint scope. For example, if we 
have U true, then a literal /■( should not be able to take the value false. Hence, we 
need a third set of clauses, the set C containing V for each pair (xt, Xj) such 
that U subsumes U'. These subsumption clauses between two literals lP and /Jj need 
only to be included if IP appears in 1C and subsumes lP that appears in 1Z. Adding 
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subsumption clauses between two literals in K would not activate any more rules. This 
is an important property since the fact that if comes from 1Z implies that \U'\ = 1, 
which ensures polynomial space for £. 

We now have a base of ground clauses, 1C U 7Z U £, that contains all available 
information about rules and negative examples. If a literal if in 1C IJ 1Z U £ appears 
positively in all models of K.IMZVJC (i.e., it belongs to the backbone [7]), we can reduce 
the local version space of c, :/ to constraints at least as specific as U. By construction of 
1C U 1Z U £, it is indeed impossible to assign c rj to a constraint more general than U and 
at the same time reject all negative instances in E~ . 

Therefore, after the presentation of a new negative instance e from E~ , we have to 
build the corresponding clause cl e , add it to /C, update £ if necessary, and test if the 
addition of cl e causes some literal 2 to enter the backbone of 1C U 7Z U £. 

The process that we described above guarantees that all the possible redundancies 
will be detected. 

Theorem 3. Given a version space V , a set E = E + U E of examples, a constraint 
type r, and the sets 1C , 7 Z, £ built as described above, if r is a possible constraint on 
(. Xi , xf) and r can be inferred from V, the set of rules of the toolkit, and E~ , then the 
literal If is a member of the backbone of 1C U 7Z U £. 

Proof. Let r be a (most specific) possible constraint on ( Xi,Xj ) that can be inferred 
from V, the set of rules of the toolkit, and E . Suppose If does not belong to the 
backbone of 1C U 1Z U £. By assumption, r is the head of some rules in the toolkit 
(otherwise CONACQ by itself can learn r on ix r , xf)). Then, If is the head of a subset 
1Z! of the rules in 7 Z. Then there exists a model M of AC U TZ U £ for which none 
of the rules R £ 1Z' has all the literals of its tail set to true. There are two cases. 
Either none of the networks Nm built from M allow a solution violating r on (x, , xf), 
which means that a rule that would infer If from M is missing in 7 Z, or some Nm 
allows solutions violating r on (x^. xf), which means that r cannot be inferred since 
there exists a network rejecting all E~ (by construction of Nm), and allowing solutions 
rejected by r on (x,; ./xf). Both cases contradict the assumption. Finally, if r was not the 
most specific constraint that could be learned on (a;*, xf) (for example r =’<’ while If 
was inferred) the proof holds for the most specific constraint r', and the clauses added 
to £ permit to infer If from If . □ 

However, this process is quite expensive from a computational point of view, since 
testing if a literal belongs to the backbone of a formula is a coNP-complete problem. 
This prevents the use of such a technique on big formulae, but as we are concerned with 
an interactive acquisition process, it is reasonable to assume that the version spaces 
we need to handle will be small enough to permit a human user to deal with them, 
and consequently we expect that the speed-of-response for backbone detection will be 
acceptable. The experimental section will discuss this feature more deeply. 



2 Note that a literal is a candidate to enter the backbone only if it appears in the right-hand side of 
a Horn clause from TZ (or it belongs to a unary clause, obviously). Furthermore, the backbone 
cannot contain negative literals since 1Z and £ are Horn bases and 1C contains only positive 
clauses. 
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Example 3 We now apply the above method to the example presented in Section 4.1. 
The set TZ of redundancy rules used in this example is presented in Table 2. It provides a 
subset of possible binary rules associated with the bias in Figure 1(a). As presented pre- 
viously, the set £ is built dynamically only when required by 1C and TZ, so we initialize 
it to 0. 



Table 2. Binary redundancy rules for the sample problem. 
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After receiving e^ , 1C = {((p 3 V l^ 3 )}, to apply the technique presented in Sec- 
tion 4.3, we test if either lf 3 or l^ 3 belong to the backbone of K, U 'R, U £. Flowever, 
running a SAT solver allows us to determine that both 1C U TZ U £ U { _l (f 3 } and 
1C U 7Z U £ U { _, ^ 3 } have solutions. Since the backbone detection did not find any 
literal, at this stage, we cannot deduce anything more than using previous methods. 

However, after receiving e 3 , 1C = {((f 3 V (^ 3 ), (Zp 2 V Zp 3 )} we detect the new 
backbone. We run a SAT solver on TZ U 1C U £ U {^f 3 } and because of the minimal 
conflict set 1C U {ri} U {^f 3 }, it fails. Therefore, l 33 belongs to the backbone and we 
can use this to refine the version space of constraint Ci 3 , removing from it the constraint 
types < and T. 

In this example, it is clear that the backbone detection on K, U TZ U £ has permitted 
us to detect (and learn) a redundant constraint that redundancy rules alone did not. 



5 Empirical Study 

To compare the approaches to exploiting redundancy to improve the quality of the ac- 
quired CSP that we have proposed in this paper, we studied their effects on a sample 
class of CSP. The bias used in this experiment is the same as that presented in Fig- 
ure 1(a). Our experiments involved generating target CSPs, which we then attempted to 
acquire by presenting examples of solutions and non-solutions of them to an acquisi- 
tion system based on either: (a) CONACQ on its own (CONACQ standard)', (b) CONACQ 
using redundancy rules only (CONACQ + rules); (c) CONACQ using both redundancy 
rules and backbone detection (CONACQ +rules + backbone). These form the columns 
in Table 3. 

In each case we randomly computed a representable set of solutions (non-solutions) 
to the target CSP which were used as a source of positive (respectively, negative) exam- 
ples for the acquisition system. We generated target CSPs with 12 variables, 12 values 
in each domain, 30 constraints and varied the degree of redundancy in them. Clearly, 
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during the acquisition process it is not known between which variables there are con- 
straints so we must assume a complete graph comprising 66 constraints, giving us 66 
local version spaces. 

The number of examples used in each experiment was equal to the number required 
for CONACQ +rules + backbone to converge. However, we set a maximum number of 
examples at 1000 after which we would terminate the acquisition process. 

For each acquisition system setup (the 3 different configurations of CONACQ), we 
recorded the total time (in seconds, secs ) required to process the set of examples and 
the final size of the version space, denoted by | | . The number of examples used is 

denoted by #Exs in the last column. We present averages of 10 runs of the experiment. 

We have studied the effects of controlling the redundancy in each CSP in two ways 
(giving us the rows in Table 3). Firstly, we introduced patterns of constraints in the 
target network of various lengths. In the experiment we used lengths based on the num- 
ber of variables in the problem: specifically, we use lengths n, n/2 and n/3 ( Length 
column). Secondly, for each length of pattern we selected a pattern of constraints with 
controlled characteristics and introduced these into the target network. In the experi- 
ment we selected patterns of the same constraint selected either from the set {<,>} 
(looser constraints) or { <,=,> } (tighter constraints). For example, a path of length n 
based on {<,=,>} is x\ > Xi > ... > X\i, while a path of length n/2 based on 
{<, >} is X\ < X 2 < . . . < Xq (see {constraints} column). As a “straw-man” we 
also present results for a target CSP where no pattern was introduced into the network. 
Our results are presented in Table 3. 

It can be clearly seen from Table 3 that, in terms of the size of the resultant version 
spaces, exploiting redundancy rules with CONACQ improves upon CONACQ alone in all 
situations. However, exploiting redundancy rules leads to an increase in the amount of 
time required by the the acquisition to process the set of examples since it relies on the 
construction of TZ, derived from the set of negative examples. Combining backbone de- 
tection and redundancy rules with CONACQ improves upon CONACQ with redundancy 
rules, in terms of version space size, but offers a considerably slower response time due 
to the use of the SAT solver 3 . 

Furthermore, we can see that as the level of redundancy in the target problem in- 
creases, from n/3 to n, regardless of the constraints involved in the redundant pattern, 
the ability of standard CONACQ to converge deteriorates dramatically. It is also inter- 
esting to note that CONACQ with redundancy rules also does progressively worse on 
these networks. This is most clearly noticeable if one compares the top-line of the ta- 
ble, where no redundant pattern was enforced, with the last line in the table, where a 
pattern of length n was present, keeping the number of examples constant in both cases. 

Simply combining redundancy rules with CONACQ is sufficient to detect much of 
the redundancy that is completely discovered by backbone detection. Specifically, com- 
paring the standard CONACQ column with the CONACQ + rules column, we can see 
that there are orders-of-magnitude differences in the size of the version spaces, with a 
very minor increase in processing time (approximately double in most cases). Note that 
CONACQ +rules + backbone requires and order of magnitude more time, but achieves 
convergence. 

3 The SAT solver used for backbone detection is z chaff, version 2003.12.04, 
http : //ee . princeton . edu/ "chaff /z chaff . php. 



136 



Christian Bessiere et al. 



Table 3. Comparison of the capability of each acquisition system to exploit redundancy in the 
larger problem studied (12 variables, 12 values, 30 constraints). 
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Finally, the effect of the tightness of the redundancy pattern introduced into the 
problem has interesting consequences. On a target network involving looser redundant 
patterns, those from {<,>}, positive instances play a central role in the acquisition 
of the problem. Specifically, more of them are required for convergence. Furthermore, 
after receiving positive instances, each local version space is smaller than would be the 
case if the redundant patterns were made up of the tighter constraints from {=,<,>}• 
For example, when > is the target, {>, T} is the largest possible version space, while 
for > it is { > , yC > . T } . This explains the exponential difference in version space size 
between tighter and looser target networks presented in Table 3. 

In summary, this experimental evaluation demonstrates that CONACQ on its own is 
insufficient to fully exploit redundancy during the acquisition process and that conver- 
gence may not be possible. The more sophisticated approaches that we propose based 
on redundancy rules and backbone detection are far superior. However, there is a trade- 
off to be considered between the need to find the tightest specification on the target 
network versus the response time of the acquisition system. 

We have seen that adding backbone detection and redundancy rules together to en- 
hance CONACQ is best in terms of convergence, but has a high response time cost. 
Just exploiting redundancy rules with CONACQ offers a very fast response time, with 
the abilities to converge quite significantly also. Obviously, it is an application-specific 
and/or problem-specific issue how this tradeoff should be dealt with. For example, in an 
interactive context, speed-of-response is a critical factor and, therefore, simply relying 
on redundancy rules may be a reasonable compromise. In such an application backbone 
detection could be run as a background process, further refining the version spaces that 
represent the target CSP. 

6 Conclusions and Future Work 

In this paper we were concerned with automating the formulation of constraint sat- 
isfaction problems from examples of solutions and non-solutions. We have combined 
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techniques from the fields of machine learning and constraint programming. In par- 
ticular we have presented a portfolio of approaches to exploiting the semantics of the 
constraints that we acquire to improve the efficiency of the acquisition process. 

We have demonstrated that the CONACQ algorithm on its own is insufficient to fully 
handle redundancy during the acquisition process. The more sophisticated approaches 
that we propose based on redundancy rules and backbone detection are far superior. 
However, there is a tradeoff to be considered between the need to find the tightest spec- 
ification on the target network versus the response time of the acquisition system. We 
have seen that adding backbone detection and redundancy rules together to enhance 
CONACQ is best but has a high response time cost, while just exploiting redundancy 
rules with CONACQ offers a very fast response time, with the abilities to converge quite 
significantly towards the target CSP. 

Our future work in this area will look at a number of important issues that must 
be addressed in real-world acquisition contexts. For example, techniques for handling 
noise and errors in the process are of critical importance, particularly if human users 
are providing the training examples [9]. 
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Abstract. We have started a systematic study of global constraints on set and 
multiset variables. We consider here disjoint, partition, and intersection constraints 
in conjunction with cardinality constraints. These global constraints fall into one 
of three classes. In the first class, we show that we can decompose the constraint 
without hindering bound consistency. No new algorithms therefore need be de- 
veloped for such constraints. In the second class, we show that decomposition 
hinders bound consistency but we can present efficient polynomial algorithms 
for enforcing bound consistency. Many of these algorithms exploit a dual view- 
point, and call upon existing global constraints for finite-domain variables like 
the global cardinality constraint. In the third class, we show that enforcing bound 
consistency is NP-hard. We have little choice therefore but to enforce a lesser 
level of local consistency when the size of such constraints grows. 



1 Introduction 

Global (or non-binary) constraints are one of the factors central to the success of con- 
straint programming [7,8, 1]. Global constraints permit the user to model a problem 
easily (by compactly specifying common patterns that occur in many problems) and 
solve it efficiently (by calling fast and effective constraint propagation algorithms). 
Many problems naturally involve sets and multisets. For example, the social golfers 
problem (probOlO at CSPLib.org) partitions a set of golfers into foursomes. Set or mul- 
tiset variables have therefore been incorporated into most of the major constraint solvers 
(see, for example, [3, 6, 5, 1 1] for sets, [4] for multisets - under the name bags). In a re- 
cent report, Sadler and Gervet describe a propagator for a global disjoint constraint on 
set variables with a fixed cardinality [10]. The aim of this paper is to study other such 
global constraints on set and multiset variables. Using the techniques proposed in [2], 
we have proved that some of these global constraints are NP-hard to propagate. For 
example, both the atmostl-incommon and distinct constraints on sets of fixed 
cardinality proposed in [9] are NP-hard to propagate. We prove that others are polyno- 
mial but not decomposable without hindering propagation. We therefore give efficient 
algorithms for enforcing bound consistency on such constraints. 

* The last three authors are supported by Science Foundation Ireland and an ILOG software 
grant. We wish to thank Zeynep Kiziltan for useful comments. 
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2 Formal Background 

A multiset is an unordered list of elements in which repetition is allowed. We assume 
that the elements of sets and multisets are integers. Basic operations on sets general- 
ize to multisets. We let occ(m, X) be the number of occurrences of m in the multiset 
X. Multiset union and intersection are defined by the identities occ(m, X U Y) = 
max(occ(m, X), occ(m, F)) and occ(m, X n Y) = min(occ(m, X), occ(m, F)). Fi- 
nally, we write |X| for the cardinality of the set or multiset X, and use lower case to 
denote constants and upper case to denote variables. 

An integer variable N is a variable whose domain is a set of integers, dom(N). 
The minimum (maximum) element of N is denoted by min(N) ( max(N )). A set 
(resp. multiset) variable X is a variable whose domain is a set of sets (resp. mul- 
tisets) of integers, given by an upper bound ub(X) and a lower bound lb(X) (i.e., 
lb(X) CXC ub(X)). We define bound consistency for integer, set and multiset vari- 
ables. We can therefore reason about constraints which simultaneously involve integer, 
set and multiset variables. An assignment is bound valid if the value given to each set or 
multiset variable is within these bounds, and the value given to each integer variable is 
between the minimum and maximum integers in its domain. A constraint is bound con- 
sistent (denoted by BC(C)) iff for each set or multiset variable X, ub(X) (resp. lb(X)) 
is the union (resp. intersection) of all the values for X that belong to a bound valid as- 
signment satisfying the constraint, and for each integer variable N, there is a bound 
valid assignment that satisfies the constraint for the maximum and minimum values in 
the domain of X. An alternative definition of BC for set and multiset variables is that 
the characteristic function (a vector of 0/1 variables) for each set variable, or the occur- 
rence representation (a vector of integer variables) for each multiset variable is bound 
consistent [12]. We say that a constraint is “decomposable” if there exists a decompo- 
sition into a polynomial number of bounded arity constraints, and this decomposition 
does not hinder bound consistency. We will also use generalized arc consistency (GAC). 
A constraint is GAC iff every value for every variable can be extended to a solution of 
the constraint. 

3 Taxonomy of Global Constraints 

Global constraints over set and multiset variables can be composed from the following 
(more primitive) constraints: 

Cardinality constraints: Many problems involve constraints on the cardinality of a 
set or multiset. For example, each shift must contain at least five nurses. 
Intersection constraints: Many problems involve constraints on the intersection be- 
tween any pair of sets or multisets. For example, shifts must have at least one person 
in common. 

Partition constraints: Many problems involve partitioning a set or multiset. For ex- 
ample, orders must be partitioned to slabs in the steel mill slab design problem. 
Ordering constraints: Many problems involve sets or multisets which are indistin- 
guishable. For example, if each group in the social golfers problem is represented 
by a set then, as groups are symmetric, these sets can be permuted. We can break 
this symmetry by ordering the set variables. 




140 



Christian Bessiere et al. 



Counting constraints: We often wish to model situations when there are constraints 
on the number of resources (values) used in a solution. For example, we might have 
set variables for the nurses on each shift and want to count the number of times each 
nurse has a shift during the monthly roster. 

Weight and colour constraints: Many problems involve sets in which there is a weight 
or colour associated with each element of a set and there are constraints on the 
weights or colours in each set. For example, the weight of the set of orders assigned 
to a slab should be less than the slab capacity. 

Tables 1 and 2 summarize some of the results presented in this paper. Given a col- 
lection of set or multiset variables. Table 1 shows different combinations of restrictions 
on the cardinality of the intersection of any pair of set or multiset variables (rows) with 
constraints restricting the cardinality of each set or multiset variable (columns). For 
instance, the top left corner is the Disjoint constraint, in which where all pairs of 
set or multiset variables are disjoint (i.e., their intersection is empty) and there is no 
restriction on the cardinality of the individual sets or multisets. On the other hand, the 
NEDis j oint also ensures that each set or multiset is non-empty. Table 2 is similar to 
Table 1 , except that we also ensure that the set or multiset variables form a partition. 
Constraints like Disjoint, Partition, and FCDis joint on set variables have 
already appeared in the literature [3, 5, 9, 10, 4]. 

All results apply to set or multiset variables unless otherwise indicated. In each 
entry, we name the resulting global constraint, state whether it is tractable to enforce 
BC on it and whether it is decomposable. For example, the FCPartition constraint 
over set variables (see Table 2) is not decomposable but we can maintain BC on it in 
polynomial time. Over multiset variables, the constraint becomes intractable. 



Table 1. Intersection x Cardinality. 
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Table 2. Partition + Intersection x Cardinality. 
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4 Disjoint Constraints 

The Disjoint constraint on set or multiset variables is decomposable into binary 
empty intersection constraints without hindering bound consistency [12]. When it is 
over sets, it appears in a number of constraint solvers such as ILOG Solver (under the 
name UcAllNullIntersect) and ECLiPSe. On multisets, the binary version of 
Disjoint appears in ILOG Configurator [4]. 

We now study bound consistency on the NEDis joint and FCDis joint con- 
straints over set and multiset variables. These constraints are not decomposable so we 
present algorithms for enforcing BC on them or we prove intractability. 

4.1 FCDisjoint 

A filtering algorithm for FCDisjoint over set variables was independently proposed 
in [10]. We give here an alternative polynomial algorithm that uses a dual encoding 
with integer variables (also briefly described at the end of [10]). J.F. Puget has pointed 
out to us that this algorithm is very similar to the propagation algorithm used in ILOG 
Solver for the UcAllNullIntersect constraint when cardinalities are specified 
for the set variables involved, and when the “extended” propagation mode is activated. 
We further show that bound consistency on FCDisjoint is NP-hard on multisets. 

When Xi, . . . , X n are set variables and A: j ..... k n are given constants, we can 
achieve BC on a FCDis j oint(Xi, . . . , X n , ki, . . . , k n ) constraint as follows: 

Algorithm BC-FCD-Sets 

1. For all v G |J ub(Xi), introduce an integer variable Y v with dom(Y v ) = {} 

2. Initialize the domain of each Y v as follows: 

(a) dom(Y v ) <— {i \ v € lb(Xi)} 

(b) if | dom( Y v ) | > 1 then fail 

(c) if \dom(Y v )\ — 0 then dom(Y v ) <— {i \ v £ ub(Xi)} U {n+1} /* n+1 is a dummy */ 

3. Maintain GAC on gcc(y, {l..n+l}, B) where Y is the array of Y v ’ s, and B is the array 
of the corresponding bounds of the i’s where for all i < n we have B[i] = ki..ki and 
B[n + 1] = O ..00 

4. Maintain the following channelling constraints, for all i < n and for all v. 

(a) i G dom(Y v ) ub(Xi) 

(b) dom(Y v ) = {*} lb(Xi) 

Remark. gcc(Y, {l..n + 1}, B) is the global cardinality constraint that imposes that in 
any assignment S of the variables Y , the value i from {l..n + 1} appears a number of 
times in the range B[i}. The dummy value n + 1 is necessary to prevent a failure of the 
gcc when an Y v cannot take any value in l..n (i.e., value v cannot be used by any Xi). 
We first prove the following lemma. 

Lemma 1. Define the one-to-one mapping between assignments S of the dual variables 
Y and assignments S' of the original set variables Xi by: v £ S'[X,;] iff S[Y V ] = i. 
Then S is consistent with gcc in step (3) of BC-FCD-Sets iff S' is consistent for 
FCDisjoint. 
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Proof. (=>) We prove that S' is: 

Disjoint: Each dual variable Y v has a unique value, say i. Therefore in S' a value 
v cannot appear in more than one of the variables X-\ . . . X n . In the case where Y v = 
n + 1, v does not belong to any set variable assignment. 

Fixed Cardinality: gcc ensures that the values i are used by exactly ki dual vari- 
ables Y v . . Hence, |S"[X i ]| = ki. 

(4=) We prove that S is: 

Consistent with gcc: By construction of Y , if | S' [Xj\ | = ki for each i £ l..n, each 
i will appear exactly ki times in S, thus satisfying the gcc. (The dummy value n+ 1 
has no restriction on its number of occurrences in Y.) 

Consistent with Y domains: By construction. □ 

In the algorithm BC-FCD-Sets, let d be the number of Y v variables introduced, 
where each Y v has domain of size at most n + 1 . 

Theorem 1. BC-FCD-Sets is a sound and complete algorithm for enforcing bound 
consistency on FCDis joint with set variables, that runs in 0(nd 2 ) time. 

Proof. Soundness. A value v is pruned from ub(Xf) in step (4) of BC-FCD-Sets 
either because i was not put in dom(Y v ) in step (2) or because the gcc has removed i 
from dom(Y v ) in step (3). Lemma 1 tells us that both cases imply that v cannot belong 
to X, in a satisfying tuple for FCDis joint. A value v is added to lb(Xf) in step 
(4) if dom(Y v ) = {i} after applying GAC on the gcc. From Lemma 1 we deduce 
that any satisfying tuple for FCDis j oint necessarily contains v in Xj . We must also 
show that the algorithm does not fail if FCDis joint can be made bound consistent. 
BC-FCD-Sets can fail in only two different ways. First, it fails in step (2) if a value 
belongs to two different lower bounds. Clearly, FCDis joint cannot then be made 
bound consistent. Second, it fails in step (3) if the gcc cannot be made GAC. In this 
case, we know by Lemma 1 that FCDis j oint cannot then be made bound consistent. 

Completeness. Let v £ ub(Xf) after step (4). Then, i £ dom{Y v ) after step (3). 
The gcc being GAC, there exists an assignment S satisfying gcc, with = i. 

Lemma 1 guarantees there exists an assignment S' with {z;} C S'[Xi], In addition, let 
v fj lb(Xi) after step (4). Then, there exists j £ dom(Y v ),j ^ i, after step (3). Thus, 
there is an assignment S satisfying gcc with = j. Lemma 1 tells us that there is 
a satisfying assignment S' of FCDis j oint with v not in S'[Xj\. 

Complexity. Step (1) is in 0(d), and step (2) in 0(nd). Step (3) has the complexity 
of the gcc, namely 0(nd 2 ) since we have d variables with domains of size at most 
n + 1. Step (4) is in 0{nd). Thus, BC-FCD-Sets is in 0{nd 2 ). □ 

Theorem 2. Enforcing bound consistency on FCDis j oint with multiset variables is 
NP-hard. 

Proof. We transform 3 Sat into the problem of the existence of a satisfying assign- 
ment for FCDis j oint. Let F = {c \, . . . , c m } be a 3CNF on the Boolean variables 
xi, . . . , x n . We build the constraint FCDis j oint(2fi, . . . , X 3n+m , k\, . . . , /c 3 „+ m ) 
as follows. Each time a Boolean variable X{ appears positively (resp. negatively) in a 
clause Cj, we create a value v\ (resp. wf). For each Boolean variable Xi, we create two 
values pi and n-i . Then, we build the 3 n + m multiset variables as follows. 
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1. Vz G 1 ..n, /* Xj will take thepj’s iff = 1*1 

(a) ki = number of occurrences of ;r, in a clause 

(b) {}CIjC {vj | Xi G Cj} U {pi, . . .,pi} l*ki copies of p^/ 

2. Mi G n + 1..2 n, /* Xi will take the n,’s iff Xi = 0 */ 

(a) ki = number of occurrences of ->x r in a clause 

(b) {}CIjC {w{ | ->Xi G Cj} U {n,, . . . , zij} /*fcj copies of Hi*/ 

3. Vz G 2n + 1..3n, /* Xj forces X,_ n and Xj_ 2 „ to be consistent */ 

(a) fcj = 1 

(b) {} C Xj C {rij, pi} 

4. Mj G l..m, /* -Xsn+j represents the clause Cj */ 

(a) ksn+j = 1 

(b) {} c X 3n+j C , w J i2 , vj 3 } if Cj = x jj V ->Xj 2 V x i3 

Let M be a model of F. We build the assignment S on the Xj’s such that Vz G 1 ..n, 
if M[xi] = 1 then S[Xj] = {pi, . . . ,pi}, S'[X i+Il ] = {w? G ub(X i+n )}, 5[X i+2 „] = 
{nj}, else S[Xj] = { v \ G zz&(Xj)}, F[Xj +n ] = {rij, . . . , nj, 5[X i+2n ] = {pi}. 

By construction, the cardinalities ki are satisfied and the disjointness are satisfied 
on Xi . . , , X 3n . In addition, the construction ensures that if a Boolean variable x t is 
true in M (resp. false in M) none of the v \ (resp. wj ) are used and all the w\ (resp. vj) 
are already taken by Xi . . . , X 3n . Thus, Mj G 1 ..to, S[X 3n+ j] is assigned one of the 
values vf or wf representing a true literal Xj or -i x, in M. And M being a 3 Sat model, 
we are sure that there exists such values not already taken by Xi . . . , X 3n . Therefore, 
S satisfies FCDis joint. 

Consider now an assignment S of the Xj’s consistent with FCDis j oint. Build the 
interpretation M such that M[x j] = 1 iff ,S[X. (+2n ] = {?Zj}. Thanks to the disjointness 
and cardinalities among X 3 . . . , X 3n , we guarantee that if .S'[Xj +2n ] = {rij} all the wj 
are already taken by Xj + „, and if 5[X j+ 2n ] = {pi} all the v \ are already taken by Xj, 
so that they cannot belong to any X 3n+ j. But S satisfying FCDis j oint, we know that 
for each j G 1 ,.m, X 3n+ j is assigned a value consistent with Xi . . . , X 3n . Therefore, 
M is a model of F. 

As a result, deciding the existence of a satisfying assignment for FCDis joint 
with multiset variables is NP-complete. Then, deciding whether GAC finds a wipe out 
on the occurrence representation is coNP-complete. In addition, on the transformation 
we use, if GAC detects a wipe then BC does 1 (because of the way p., and n t values 
are set). So, deciding whether BC detects a wipe out is coNP-complete, and enforcing 
bound consistency on FCDis j oint with multiset variables is NP-hard. □ 

4.2 NEDisjoint 

The constraint NEDis j oint(Xi, . . . , X n ) on set variables can be seen as a particular 
case of constraint FCDis j oint in which the cardinality of the variables Xj can vary 

1 GAC on the occurrence representation of multisets is in general not equivalent to BC (whilst 
on sets it is). If ub(Xi) = ub(X 2 ) = {1, 1, 2, 2}, and k\ = = 2, GAC on the occurrence 

representation of FCDis j oint removes the possibility for Xi to have 1 occurrence of 1. BC 
does not remove anything since the bounds 0 and 2 for occ(l, Xi) are consistent. 
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from 1 to oo instead of being fixed to ki. Since the way the algorithm BC-FCD-Sets 
is written permits to express such an interval of values for the cardinality of the set 
variables X t , the algorithm BC-NED-Sets is a very simple modification of it. In step 
(3) of BC-FCD-Sets it is indeed sufficient to assign B[i] to l..oo instead of ki..ki, 
for 1 < i < n. J.F. Puget has pointed out to us that the UcAllNullIntersect 
constraint in “extended” mode will also achieve BC on non-empty set variables. 

When NEDis j oint involves multiset variables, BC remains polynomial. In fact, it 
is sufficient to transform the multisets in sets and to use BC-NED-Sets on the obtained 
sets. Once BC achieved on these sets, we just have to restore the initial number of 
occurrences, noted init-OCC, for each remaining value. The cardinality of the multisets 
are not bounded above, so that if one value has support, any number of occurrences of 
the same value have support also. 

Algorithm BC-NED-Msets 

1. for each i £ 1 ..n, v occurring in ub(Xi) do 

init-occ„t(Xi, v) <— occ(v, ub(Xi))\ occ(v, ub(Xi)) <— 1 

init-occ;i,(,Y;, v) <— occ(v, lb(Xi)); occ(v,lb(Xi)) <— minfl, init-occ;j,(Xi, v)) 

2. BC-NED-Sets(AT, . . . , X n ) 

3. for each i £ l..n,v £ ub(Xi) do 

occ(v, ub(Xi)) <— init-occ U (,(Xi, v) 

if v £ lb(Xi) then occ(v, lb(Xi)) <— maa:(l, init- 0 CC(b(Xi, v)) 



5 Partition Constraints 

The Partition constraint is decomposable into binary empty intersection constraints 
and ternary union constraints involving n additional variables without hindering bound 
consistency [12], It appears in a number of constraint solvers such as ILOG Solver 
(under the name IlcPartition) and ECLiPSe when it is over sets. On the other 
hand, the non-empty and fixed cardinality partition constraints are not decomposable. 
We therefore present algorithms for enforcing BC on them or we prove intractability. 



5.1 FCPartition 

It is polynomial to enforce BC on the FCPartition constraint on set variables, but 
NP-hardon multisets. On set variables, enforcing BC on FCPartition is very similar 
to enforcing BC on FCDis j oint. Indeed, if the set X being partitioned is fixed, then 
we can simply decompose a fixed cardinality partition constraint into a fixed cardinality 
disjoint, union and cardinality constraints without hindering bound consistency. 2 If X 
is not fixed, we need to do slightly more reasoning to ensure that the Aj’s are a partition 
of X. We present here the additional lines necessary to deal with this. 

Line numbers with a prime represent lines modified wrt BC-FCD-Sets. The oth- 
ers are additional lines. 

2 As in the FCDis joint case, J.F. Puget tells us that the filtering algorithm of the 
IlcPartition constraint in [5] uses a similar approach when the "extended” mode is set. 
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Algorithm BC-FCP-Sets 

1\ For all v £ ub(X), introduce an integer variable Y v with dom(Y v ) = {} 

2. Initialize the domain of each Y v as follows: 

(c’) if \dom(Y v )\ = 0 then dom(Y v ) <— {i \ v £ ub(Xi)} 

(d) if v lb(X) then dom(Y v ) <— dom(Y v ) U {n + 1} 

(e) if | dom(Y v ) | = 0 then fail 

4. Maintain the following channelling constraints, for all* < n and for all v: 

(c) n + If. dom(Y v ) lb(X) 

(d) ub(X) C\Jub(Xi) 

Lemma 2. Define the one-to-one mapping between assignments S of the dual variables 
Y and assignments S' of the original set variables X, and X by: S' [X] = (J S' [Xi ] and 
v £ 5 7 [Xi] iff S[Y V ] = i. Then S is consistent with gcc in step (3) of BC-FCP-Sets 
iff S' is consistent for FCParti t ion. 

Proof. (=i>) We prove that S' is: 

Disjoint and Fixed Cardinality: See Lemma 1. 

Partition: Lines (2.c’-d) guarantee that for a value v £ lb{X), Y v cannot be assigned 
the dummy value n + 1 in S. Hence, S’ necessarily has an X j with v £ S' [Xf. Because 
of line (L), none of the Y v represent a value v f ub(X). Hence, for all i. S' [X.f\ C 
ub(X), then 5"[X] C ub(X). 

(4=) We prove that S is: 

Consistent with gcc: See Lemma 1 . 

Consistent with Y: If S' is a satisfying assignment for FCParti tion, S'fX,] C 
S"[X],VL Since S"[X] C ub(X), we know that any value v appearing in S' has a 
corresponding variable Y v . And by construction (lines 2. a, 2.c, 2.d, we know that S is 
consistent with Y domains. □ 

Theorem 3. BC-FCP-Sets is a sound and complete algorithm for enforcing bound 
consistency on FCParti tion with set variables that runs in 0(nd 2 ) time. 

Proof. Soundness. A value v is pruned from ub(Xi) in step (4) of BC-FCP-Sets 
for one of the reasons that already held in FCDis joint or because Y v has not been 
created in line (L). Lemma 2 tells us that all cases imply that v cannot belong to X t 
in a satisfying tuple for FCPartition. Soundness of lb(Xf) comes from Lemma 2 
as it came from Lemma 1 on FCDis joint. We must also show that the algorithm 
does not fail if FCPartition can be made bound consistent. BC-FCP-Sets can 
fail in line (2.e) if a value v that belongs to lb(X ) cannot belong to any Xi. Clearly, 
FCPartition cannot then be made bound consistent. The other cases of failure are 
the same as for FCDis joint. A value v is pruned from ub(X) in step (4.d) because 
none of the Xi contains v in its upper bound. This means that this value cannot belong 
to any satisfying assignment S' (Lemma 2). A value v is added to lb(X) in line (4.c) 
if no assignment S satisfying the gcc verifies ,S'[H U ] = n + 1. This means that v is 
assigned to a variable in all assignments satisfying FCPartition. 
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Completeness. Let v £ ub(X ) after step (4). Then, there exists Xi with v £ ub(Xf), 
and so t £ dom(Y v ) after step (3). gcc being GAC, there exists an assignment S 
satisfying gcc, with = i. Lemma 2 guarantees there exists an assignment S' 

with {n} C N'JAT]. Thus, v is in ub(X). In addition, let v g lb(X) after step (4). 
Then, n + 1 £ dom(Y v ) after step (3). Thus, there is an assignment S satisfying 
gcc with = n + 1. Lemma 2 tells us that there is a satisfying assignment S' 

of FCPartition with v not in 5"[Af]. 

Complexity. See proof of BC-FCD-Sets. □ 

Theorem 4. Enforcing BC on FCParti tion(Xi , . . . , X n , X, k \, . . . , k n ) with mul- 
tiset variables is NP-hard. 

Proof. We know that deciding the existence of a satisfying assignment is NP-complete 
for FCDis j oint(Ag, . . . , X n , . . . , k n ) with multiset variables. If we build a mul- 
tiset variable A' with lb{X) = 0 and ub{X) = (J i ub(Xi). then FCPartition(Ag, 

. . . , X n , X, ki, . . . , k n ) has a satisfying assignment if and only if FCDis j oint(Aii, 

. . . , X n , ki, . . . , k n ) has one. Thus, enforcing bound consistency on FCPartition is 
NP-hard. □ 

5.2 NEPartition 

The constraint NEDis j oint(ATi, . . . ,X n ) on set variables was a particular case of 
FCDis j oint in which the cardinality of the variables Xi can vary from 1 to oo instead 
of being fixed to ki. This is exactly the same for NEPartition on set variables, which 
is a particular case of FCPartition. Replacing “S[i] <— ki..k ” by “B[i\ <— l..oo” 
in BC-FCP-Sets, we obtain BC-NEP-Sets. 

When NEPartition involves multiset variables, BC remains polynomial. As for 
BC-NED-Msets, the trick is to transform multisets in sets and to use BC-NEP-Sets 
on the obtained sets. We just need to be careful with the compatibility of the occurrences 
of values in X t variables and the X being partitioned. Once BC is achieved on these 
sets, we have to restore the initial number of occurrences and check again compatibility 
with X. 

Algorithm BC-NEP-Msets 

1. if (J. Ib(Xi) g ub{X) or lb(X) g (J . ub (Xi) then failure 

2. for each i £ l..n, v occurring in ub(Xi) do 

2.1. if occ(v, ub(Xi)) < occ(v, lb(X)) then occ(v, ub(Xi)) <— 0 

2.2. if occ(u, ub(Xi)) > occ(v, ub(X)) then occ(v, ub(Xi)) <— occ(v, ub(X)) 

2.3. init-occ„( ,(Xi,v) <— occ(v, ub(Xi)); occ(v, ub(Xi)) <— 1 

2.4. init-OCC;t(Xi, v) <— occ(v,lb(Xi));occ(v,lb(Xi)) <— min(l, init-OCC;6(W, v)) 

3. store 16(X); lb{X) <— set-of (lb(X)); ub(X) <— set-of (ub(X)) 

4. BC-NEP-Sets(AT, . . . ,X n ,X) 

5. restore lb(X) 

6. for each i £ l..n, v £ ub(Xi) do 

6.1. occ(v, ub(Xi)) <— init-occ„6(Xi, v) 

6.2. if v £ lb(Xi) then occ(u, i6(Xi)) <— maa:(l, init-0CC;b(Ai, u), occ(u, Z6(X))) 

7. lb(X) <- lb(X) U Ui lb{Xi)- ub(X) <- Q. ub(Xi) 
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Theorem 5. BC-NEP-Msets is a sound and complete algorithm for enforcing bound 
consistency on NEParti tion with multiset variables, that runs in 0(nd 2 ) time. 

Proof (Sketch.) As for NEDis j oint on multiset variables, enforcing bound consis- 
tency on NEPartition after having transformed the multisets in sets (i.e., we keep 
only one occurrence of each value in the lower and upper bounds), the removal of a 
value v from an upper bound by BC-NEP-Sets is a sufficient condition for the re- 
moval of all occurrences of v in the original multiset upper bound. The addition v to a 
lower bound is a sufficient condition for the addition of some occurrences of v in the 
lower bound (the right number depends on the number of occurrences of v in lb(X) 
and in the lower bound of the X t holding v. It is then sufficient to ensure consistency 
between the number of occurrences in the X, and X (lines 1, 2.1, 2.2, and 7), to trans- 
form multisets in sets (lines 2.3, 2.4, and 3), to call BC-NEP-Sets (line 4), and to re- 
store appropriate numbers of occurrences (lines 5 and 6). Line 1 guarantees that ub(X) 
can cover all the Xfs lower bounds and that lb(X) can be covered by the Xfs upper 
bounds. A value v can be assigned in X, if and only if it can cover the occurrences of v 
in lb(X ) (line 2. 1 ), and it cannot occur more than in ub(X ) (line 2.2). Finally, a value v 
occurs in lb(X) at least as many times as it occurs in some lb{Xf), and occurs in ub(X) 
exactly as many times as in the ub(Xi) having its greatest number of occurrences (line 
7). The complexity is dominated by line 4, with the call to BC-NEP-Sets which is 
0{nd 2 ). □ 

6 Intersection Constraints 

The Disjoint constraint restricts the pair-wise intersection of any two set or multiset 
variables to the empty set. We now consider the cases where the cardinality of the 
pair-wise intersection is either bounded or equal to a given constant or integer variable 
(lower case characters denote constants while upper case denote variables): 
Intersect<(Xi, . . . , X n , I\) iff \Xi n Xf < K for any i j. 
Intersect>(Xi, . . . , X n , K) iff j Xi n Xf > K for any i f j. 
Intersect = (A'i, . . . , X n , K) iff \Xi n Xj \ = K for any i j. 

As usual, we can also add non-emptiness and fixed cardinality constraints to the 
set or multiset variables. For example, FCIntersect<(Xi, . . . , X n , K, C) iff | Xt fl 
Xj < I\ for any i j and \Xi\ = C for all i. If K = 0, Intersect< and 
Intersects are equivalent to Disjoint. 

6.1 At Most Intersection Constraints 

We show that Intersect< and NEIntersect< can be decomposed without hin- 
dering bound consistency, but that it is NP-hard to enforce BC on FCIntersect<. 

Theorem 6. BC(lntersect<(Xi , . . . , X n , K)) is equivalent to BC(\Xi fl Xj \ < 
K) for all i < j. 

Proof Suppose BC{\XiC\Xj \ < K) for alii < j. We will show that BC(\/i < j. \ Xi n 
Xj | < K). Consider the occurrence representation of the set or multiset variables. Let 
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Xu be the number of occurrences of the value l in A - , . Consider the upper bound on 
Xu. We will construct a support for this value for X,j that simultaneously satisfies 
\X t fl Xj\ < K for all i < j. The same support will work for the lower bound on 
Xu- First, we assign K with its upper bound. Then we pick any j with i ^ j. As 
BC(\Xi n Xj | < K ), there is some assignment for Xj n and Xi m (l ^ m) within their 
bounds that satisfies \X l C\X :) \ < K. We now extend these assignments to get a complete 
assignment for every other set or multiset variable as follows. Every other X pq (p ^ i 
and p f j ) is assigned its lower bound. This can only help satisfy \Xj n Xj\ < K for 
all i < j. This assignment is therefore support for Xu- We can also construct support 
for the upper of lower bound of K in a similar way. Maximality of the bound consistent 
domains is easy. Consider any value for Xu smaller than the lower bound or larger than 
the upper bound. As this cannot be extended to satisfy X, n X :i < K for some j, it 
clearly cannot be extended to satisfy X, n Xj < K for all i < j. A similar argument 
holds for any value for K smaller than the lower bound or larger than the upper bound. 
Hence, BC(\/i < j.\Xi (T Xj\ < K). □ 

Given a set of set or multiset variables, the non-empty intersection constraint 
NEIntersect<(Ai, . . . , X n , K) ensures that \Xi CXj 1 < K fori ^ j and \Xi\ > 0 
for all i. If K = 0, this is the NEDis j oint constraint which is not decomposable. If 
K > 0, the constraint is decomposable. 

Theorem 7. If K > 0 then BC(NEIntersect<(Xi , . . . , X ni K)) is equivalent to 
BCQXiCiXf < K) for all i < j and BC(\Xi\ >0 ) for alii. 

Proof. Suppose BC(\Xj fl Xj < K) for all i < j and BC(\Xi\ > 0) for all i. 
Then \lb(Xf) fl lb{Xf)\ < max(K) for all i < j. And if \ub(Xj)\ = 1 for any i then 
Ib(Xf) = ub(Xi). Consider some variable X, t and any value a £ ub(Xf) — lb(Xf). 
We will find support in the global constraint for Xi to take the value {a} U lb(Xf). 
Consider any other variable Xj. If \lb(Xj)\ = 0 then we pick any value b £ ub(Xj) 
and set Xj to {?;}. This will ensure we satisfy the non-emptiness constraint on Xj. As 
k > 0 and \Xj\ = 1, we will satisfy the intersection constraint between Xj and any 
other variable. If \lb(Xj) > 0 then we set Xj to lb(Xj). This again satisfy the non- 
emptiness constraint on Xj. Since \lb(Xf) fl lb(Xj) \ < max(K) for all i < j, we 
will also satisfy the intersection constraints. Support can be found in a similar way for 
Xi to take the value lb(Xi) if this is non-empty. Finally, min(K) has support since 
BC{\Xi n Xj < K) for all i < j. Hence NEIntersect<(Xi, . . . , X n , K) is BC. 

□ 

Enforcing BC on FCIntersect< is intractable. 

Theorem 8. Enforcing BC on FCIntersect<(X \, . . . , X n , k, c ) for c > k > 0 is 
NP-hard. 

Proof Immediate from Theorem 5 in [2] . □ 

Sadler and Gervet introduce the atmostl-incommon and distinct con- 
straints for set variables with a fixed cardinality [9]. The atmostl-incommon con- 
straint is FCIntersect<(Xi, . . . , X n , 1, c). Similarly, the distinct constraint on 
sets of fixed cardinality is is FCIntersect<(Xi, . . . , X n , c — 1, c). The reduction 
used in Theorem 5 in [2] works with all these parameters. Hence, all are NP-hard to 
propagate. 
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6.2 At Least Intersection Constraints 

Similar to the at most intersection constraint, Intersect> and NEIntersect> can 
be decomposed without hindering bound consistency. However, it is NP-hard to enforce 
BC on FCIntersect>. 

Theorem 9. BC( Intersect>(Xi : . . . , X n , K)) is equivalent to BC(\Xi (~l Xj\ > 
K) for all i < j. 

Proof. The proof is analogous to that of Theorem 6 except we extend a partial assign- 
ment to a complete assignment that is interval support by assigning each of the addi- 
tional X pq with the upper bound and (where appropriate) K with its lower bound. □ 
Two sets cannot have an intersection unless they are non-empty. Hence this result 
also shows that HC(NEIntersect>(A'i, . . . , X n , K)) for K > 0 is equivalent to 
BC on the decomposition. By comparison, enforcing BC on FCIntersect> is in- 
tractable. 

Theorem 10. Enforcing BC on FCIntersect>(Xi , . . . , X n . k, c) for c > k > 0 is 
NP-hard. 

Proof. We let k = 1. We can reduce the k = 1 case to the k > 1 case by adding k — 1 
additional common values to each set variable. The proof again uses a reduction of a 
3SAT problem in n variables. The same reduction is used for set or multiset variables. 
We let c = n and introduce a set variable, S with domain {} C S C {1, — il , . . . , n, -in}. 
This will be set of literals assigned true in a satisfying assignment. For each clause, ip 
we introduce a set variable, X v . Suppose p — x, V ~<Xj V Xk, then X v has domain 
{rff , . . . , C Xp C {*, -i j, k, rff , . . . , df_ i}, where df 7 . . . , df l _ 1 are dummy 

values. To satisfy the intersection and cardinality constraint, S must take at least one 
of the literals which satisfy p. Finally, we introduce n set variables, X, to ensure that 
one and only one of i and ->i is in S. Each Xi has domain {/},..., /,®_ 1 } C Xi C 
{/|, . . . , fn_iii, -i i}. The constructed set variables then have a solution which satisfies 
the intersection and cardinality constraints iff the original 3SAT problem is satisfiable. 
Hence enforcing bound consistency is NP-hard. □ 

6.3 Equal Intersection Constraints 

Unlike the at most or at least intersection constraints, enforcing BC on Intersects 
is intractable even without cardinality constraints on the set or multiset variables. 

Theorem 11. Enforcing BC on Intersect— (Ai, . . . , X n . k) is NP-hard for k > 0. 

Proof. Immediate from Theorem 6 in [2] . □ 

The same reduction can also be used with the constraint that each set or multiset 
has a non-empty or fixed cardinality. 

Lemma 3. Enforcing BC on FCIntersect—(Xi , . . . , X n , k) is NP-hard for k > 0. 
Lemma 4. Enforcing BC on NEIntersect—(Xi , . . . , X n , k , c) is NP-hard fork > 0. 
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7 Experimental Results 

To show the benefits of these global constraints, we ran some experiments using ILOG’s 
Solver toolkit with a popular benchmark involving set variables. The social golfers 
problem (p, to, n, t ) is to schedule t golfers into to groups of size n for p weeks, such 
that no golfer plays in the same group as any other golfer twice. To model this prob- 
lem, we introduce a set variable of fixed cardinality to represent every group in each 
week. Each week is then a partition of the set of golfers. Between any two groups, their 
intersection must contain at most one golfer. We also consider a generalization of the 
problem in which there is an excess of golfers and some golfers rest each week. When 
there is no excess of golfers, FCPartition shows no improvement upon its decom- 
position into ILOG’s IlcPartition and cardinality constraints. When there is an 
excess of golfers, the partitioning constraint is replaced by a disjointness constraint. 

We compare the same model using the FCDis j oint constraint and its decomposi- 
tion into ILOG’s IlcAllNullIntersect constraint and cardinality constraints on 
groups. In the latter case, the filtering level is fixed either to “Default” or “Extended”. 
We understand from conversations with Ilog that “Default” implements the decomposi- 
tion whilst “Extended” enforces BC on the global constraint. We ran experiments with a 
time limit of 10 minutes, and five settings for m and n. For each, we present the results 
for all numbers p of weeks such that at least one strategy needs at least one fail, and at 
least one strategy can solve the problem within the time limit. We solved each problem 
using five different variable ordering strategies: 

- static golfer: picks each golfer in turn, and assigns him to the first possible group 
of every week. 

- static week: picks each golfer in turn, and assigns him to one group in the first 
incomplete week. 

- min domain: picks a pair (golfer, week) such that the total number of groups in 
which the golfer can participate in during the given week is minimum, then assigns 
this golfer to one group. 

- default (group): ILOG Solver’s default strategy for set variables ordered by 
groups; this picks an element v £ ub(S) and adds it to the lower bound ( v £ S). 

- default (week): ILOG Solver’s default strategy for set variables ordered by weeks. 

We observe that, in terms of fails, FCDis j oint and IlcAllNullIntersect- 
Extended are equivalent, with two exceptions 3 . Both outperform the decomposition 
model or are the same. The runtimes follow a similar behaviour, although the decompo- 
sition model can be faster when the number of fails are equal. The speed-up obtained by 
reasoning on the global constraint rather than on disjointness and cardinality separately 
can be of several orders of magnitude in some cases. With the two default heuristics 
(last two columns in the table), we notice no difference between our global constraint 
and the decomposition. These heuristics are not, however, always the best. The min do- 
main heuristic can be superior, but sometimes needs the pruning provided by the global 
constraint to prevent poor performance. 

3 We do not understand these two exceptions but suspect there may be some complex interaction 
with the dynamic branching heuristic. 
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Number of Fails / CPU Time (s) 



problem 


model 


static golfer 


static week 


min domain 


group (set) 


week (set) 


(6,8,4,36) 


FCDis joint 


10/0.15 


- 


52/0.14 


183/0.11 






IlcAllNullIntersect (Extended) 


10/0.13 


- 


52/0.11 


183/0.11 






IlcAllNullIntersect (Default) 


- 


- 


190/0.13 


183/0.08 




(3,6,6,37) 


FCDis joint 


- 


548/0.21 


0 / 0.02 


27/0.02 


22232 / 2.36 




IlcAllNullIntersect (Extended) 


- 


548 / 0.2 


0 / 0.02 


27/0.03 


22232/ 1.6 




IlcAllNullIntersect (Default) 


- 


- 


0 / 0.01 


27/0.02 


22232/ 1.3 


(3,6,6,38) 


FCDis joint 


- 


67 / 0.03 


0 / 0.03 


4 / 0.02 


3446 / 0.39 




IlcAllNullIntersect (Extended) 


- 


67 / 0.04 


0 / 0.02 


4 / 0.03 


3446 / 0.26 




IlcAllNullIntersect (Default) 


- 


- 


0 / 0.01 


4 / 0.02 


3446/0.2 


(3,6,6,39) 


FCDis joint 


- 


1261/0.3 


0 / 0.02 


7 / 0.03 


171574/16.52 




IlcAllNullIntersect (Extended) 


- 


1261/0.27 


0 / 0.02 


7 / 0.02 


171574/11.39 




IlcAllNullIntersect (Default) 


- 


- 


0 / 0.02 


7 / 0.02 


171574/8.85 


(3,6,6,40) 


FCDis joint 


12/0.02 


48 / 0.03 


0 / 0.02 


0 / 0.02 


8767 / 0.79 




IlcAllNullIntersect (Extended) 


12/0.03 


48 / 0.03 


0 / 0.02 


0 / 0.03 


8767/0.6 




IlcAllNullIntersect (Default) 


- 


- 


0 / 0.02 


0 / 0.02 


8767 / 0.46 


(3,5,5,26) 


FCDis joint 


- 


44 / 0.03 


0/0.01 


2/0 


813/0.08 




IlcAllNullIntersect (Extended) 


- 


44 / 0.02 


0/0.01 


2/0.01 


813/0.07 




IlcAllNullIntersect (Default) 


- 


177880 / 9.62 


0/0.01 


2/0 


813/0.05 


(3,5,5,27) 


FCDis joint 


967161 / 160.92 


5 / 0.01 


0/0.01 


1 / 0.01 


62 / 0.01 




IlcAllNullIntersect (Extended) 


967161 /96.94 


5 / 0.01 


0 / 0.02 


1 / 0.01 


62 / 0.01 




IlcAllNullIntersect (Default) 


- 


1106/0.09 


0/0 


1 / 0.01 


62 / 0.02 


(3,5,5,28) 


FCDis joint 


9 / 0.01 


32 / 0.03 


0 / 0.01 


19/0.01 


661/0.08 




IlcAllNullIntersect (Extended) 


9 / 0.01 


32 / 0.01 


0 / 0.01 


19/0.01 


661/0.06 




IlcAllNullIntersect (Default) 


58218/3.65 


22860/1.23 


0 / 0.01 


19/0.01 


661/0.05 


(3,5,5,29) 


FCDis joint 


6/0.02 


2/0.01 


0/0.01 


0 / 0,01 


18/0.01 




IlcAllNullIntersect (Extended) 


6/0.01 


2/0.01 


0/0.01 


0 / 0,01 


18/0.01 




IlcAllNullIntersect (Default) 


37208 / 2.25 


209 / 0.02 


0/0.01 


0/0 


18/0.01 


(3,9,9,83) 


FCDis joint 


- 


- 


0/0.12 


453/0.17 






IlcAllNullIntersect (Extended) 


- 


- 




453/0.13 






IlcAllNullIntersect (Default) 


- 


- 




453 / 0.11 




(3,9,9,84) 


FCDis joint 


- 


- 


0/0.12 


5 / 0.09 






IlcAllNullIntersect (Extended) 


- 


- 


5/0.13 


5/0.1 






IlcAllNullIntersect (Default) 


- 


- 




5 / 0.08 




(3,9,9,85) 


FCDis joint 


- 


- 


0/0.13 


30/0.09 






IlcAllNullIntersect (Extended) 


- 


- 


0/0.13 


30/0.1 






IlcAllNullIntersect (Default) 


- 


- 


1442064 / 159.75 


30 / 0.08 




(10,9,3,30) 


FCDis joint 


464/0.84 


264 / 0.45 




16055 / 3.75 


15 / 0.26 




IlcAllNullIntersect (Extended) 


464 / 0.56 


264/0.32 




16055/2.2 


15 / 0.23 




IlcAllNullIntersect (Default) 


- 


- 




16055/ 1.99 


15 / 0.22 


(10,9,3,31) 


FCDis joint 


37 / 0.46 


1 / 0.25 


0/0.39 


2 / 0.28 


113/0.29 




IlcAllNullIntersect (Extended) 


37 / 0.41 


1 / 0.24 


0/0.32 


2 / 0.26 


113/0.24 




IlcAllNullIntersect (Default) 


- 


51223/10.45 


0/0.32 


2 / 0.25 


113/0.23 



8 Conclusions 

We have begun a systematic study of global constraints on set and multiset variables. 
We have studied here a wide range of disjoint, partition, and intersection constraints. 
The disjoint constraint on set or multiset variables is decomposable (and hence polyno- 
mial). On the other hand, the non-empty and fixed cardinality disjoint constraints are not 
decomposable without hindering bound consistency. We therefore present polynomial 
algorithms for enforcing bound consistency on the non-empty disjoint constraints for 
set or multiset variables, for enforcing BC on the fixed cardinality disjoint constraint 
for set variables, and prove that enforcing BC on the fixed cardinality disjoint con- 
straint on multiset variables is NP-hard. We give very similar results for the partition, 
non-empty and fixed cardinality partition constraints. We also identify those non-empty 
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intersection constraints which are decomposable, those which are not decomposable but 
polynomial, and those that are NP-hard. Many of the propagation algorithms we pro- 
pose here exploit a dual viewpoint, and call upon existing global constraints for finite- 
domain variables like the global cardinality constraint. We are currently extending this 
study to counting constraints on set and multiset variables. Propagation algorithms for 
such constraints also appear to exploit dual viewpoints extensively. 
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Abstract. We present a cooperation technique using an accurate man- 
agement of nogoods to solve a hard real-time problem which consists 
in assigning periodic tasks to processors in the context of fixed priori- 
ties preemptive scheduling. The problem is to be solved off-line and our 
solving strategy is related to the logic based Benders decomposition. A 
master problem is solved using constraint programming whereas sub- 
problems are solved with schedulability analysis techniques coupled with 
an ad hoc nogood computation algorithm. Constraints and nogoods are 
learnt during the process and play a role close to Benders cuts. 



1 Introduction 

Real-time systems are at the heart of embedded systems and have applications 
in many industrial areas: telecommunication, automotive, aircraft and robotics 
systems, etc. Today, applications ( e.g . cars) involve many processors to serve dif- 
ferent demands (cruise control, ABS, engine management, etc.). These systems 
are made of specialized and distributed processors (interconnected through a net- 
work) which receive data from sensors, process appropriate answers and send it 
to actuators. Their main characteristics lie in functional as well as non-functional 
requirements like physical distribution of the resources and timing constraints. 
Timing constraints are usually specified as deadlines for tasks which have to be 
executed. Serious damage can occur if deadlines are not met. In this case, the 
system is called a hard real-time system and timing predictability is required. 
In this field, some related works are based on off-line analysis techniques that 
compute the response time of the constrained tasks. Such techniques have been 
initiated by Liu and al. [15] and consist in computing the worst-case scenario 
of execution. Extensions have been introduced later to take into account shared 
resources, distributed systems [23] or precedence constraints [7]. 

Our problem consists in assigning periodic and preemptive tasks with fixed 
priorities (a task is periodically activated and can be preempted by a higher 
priority task) to distributed processors. A solution is an allocation of the tasks 
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on the processors which meets the schedulability requirements. The problem 
of assigning a set of hard preemptive real-time tasks in a distributed system 
is NP-Hard [14]. It has been tackled with heuristic methods [6,17], simulated 
annealing [22,4] and genetic algorithms [6,19]. However, these techniques are 
often incomplete and can fail in finding any feasible assignment even after a 
large computation time. New practical approaches are still needed. 

We propose here a decomposition based method which separates the alloca- 
tion problem itself from the scheduling one. It is related to the Benders decompo- 
sition and especially to the logic Benders based decomposition. On the one hand, 
constraint programming offers competitive tools to solve the assignment prob- 
lem, on the other hand, real-time scheduling techniques are able to achieve an 
accurate analysis of the schedulability. Our method uses Benders decomposition 
as a way of generating precise nogoods in constraint programming. 

This paper is organized as follows: Section 2 introduces the problem. Related 
work and solving strategies are discussed in Section 3. The logical Benders de- 
composition scheme is briefly introduced and the links with our approach are put 
forward. Section 4 is dedicated to the master/subproblems and communication 
between them thanks to nogoods. Experimental results are presented in Section 
5 and finally, a discussion of the technique is made in Section 6. 

2 Problem Description 

2.1 The Real-Time System Architecture 

The hard real-time system we consider can be modeled with a software archi- 
tecture (the set of tasks) and a hardware architecture (the physical execution 
platform for the tasks). Such a model is used by Tindell [22]. 




(a) Hardware architecture 




(b) Software architecture 



Fig. 1. Main parameters of the problem. 



Hardware Architecture. The hardware architecture is made of a set V = 
{pi, . . . ,pk , . . . ,p m } of m identical processors with a fixed memory capacity pk, 
connected to a network. All the processors from V have the same processing 
speed. They are connected to a network with a transit rate of S and a token ring 
protocol. A token travels around the ring allowing processors to send data only 
if they hold the token. It stays at the same place during a fixed maximum period 
of time large enough to ensure all messages waiting on processors are sent. 



Decomposition and Learning for a Hard Real Time Task Allocation Problem 



155 



Software Architecture. To model the software architecture, we consider a 
valued, oriented and acyclic graph (T,C). The set of nodes T = {fi, 
corresponds to the tasks whereas the set of edges C C T x T refers to message 
sending between tasks. 

A task ti is defined by its temporal characteristics and resource needs: its 
period, X) (a task is periodically activated); its worst-case execution time without 
preemption, WCETi and its memory need, rrii. Edges c y - = (tj, tj) £ C are valued 
with the amount of exchanged data: dij. Communicating tasks have the same 
activation period. Moreover, they are able to communicate in two ways: a local 
communication with no delay using the memory of the processor (requiring the 
tasks to be located on the same processor) and a distant communication using 
the network. In any case, we do not consider precedence constraints. Tasks are 
periodically activated in an independent way, they read and write data at the 
beginning and the end of their execution. 

Finally, each processor is scheduled with a fixed priority strategy. A priority, 
prioi = i is given to each task, tj has priority over ti if and only if prioj < prioi 
and a task execution may be pre-empted by higher priority tasks. 

2.2 The Allocation Problem 

An allocation is an application A : T — > V mapping a task ti to a processor pk'. 

U i-> A(ti) = p k (1) 

The allocation problem consists in finding the application A which respects the 
constraints described below. 



Timing Constraints. They are expressed by the means of deadlines for the 
tasks. Timing constraints enforces the duration between the activation date of 
any instance of the task ti and its completion time to be bounded by its deadline 
Di (the constraint on Di is detailed in 4.2). 



Resource Constraints. Three kinds of constraints are considered: 

— Memory capacity: The memory use of a processor pk cannot not exceed 
its capacity (pk)' 

V7c = 1..TO, ^2 m i ^ Mfe (2) 

A(ti)=pk 

— Utilization factor: The utilization factor of a processor cannot exceed its 
processing capacity. The ratio r 7 ; = WCETi/Ti means that a processor is 
used r,% of the time by the task ti. The following inequality is a simple 
necessary condition of schedulability: 

]T WCETi/Ti < 1 

A{ti)=p k 



\/k = 1 ..to, 



( 3 ) 
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— Network use: To avoid overload, the amount of data carried along the 
network per unit of time cannot exceed the network capacity: 

dij/Ti ^ S (4) 

Cij — (f i , tj ) 

MU) # Aitj) 

Allocation Constraints. Allocation constraints are due to the system archi- 
tecture. We distinguish three kinds of constraints: residence, co-residence and 
exclusion. 

— Residence: A task sometimes needs a specific hardware or software resource 
which is only available on specific processors ( e.g . a task monitoring a sensor 
has to run on a processor connected to the input peripheral). It is a couple 
( ti , a) where ti £ T is a task and a C V is the set of available processors for 
the task. A given allocation A must respect: 

A(ti) £ a (5) 

— Co-residence: This constraint enforces several tasks to be placed on the 
same processor (they share a common resource). Such a constraint is defined 
by a set of tasks /3 CT and any allocation A has to fulfil: 

e P 2 ,A(ti) = A(tj) (6) 

— Exclusion: Some tasks may be replicated for fault tolerance and therefore 
cannot be assigned to the same processor. It corresponds to a set 7 C T of 
tasks which cannot be placed together. An allocation A must satisfy: 

£ 7 2 ,A(ti) ^ A(tj) (7) 

An allocation is said to be valid if it satisfies allocation and resource con- 
straints. It is said to be schedulable if it satisfies timing constraints. A solution 
for our problem is a valid and schedulable allocation of the tasks. 

3 About Related Decomposition Approaches 

Our approach is based to a certain extent on a Benders decomposition [2] scheme. 
We will therefore introduce it to highlight the underlying concepts. Benders 
decomposition can be seen as a form of learning from mistakes. It is a solving 
strategy that uses a partition of the problem among its variables: x,y. The 
strategy can be applied to a problem of this general form: 

P : Min /( x) + cy 

s.t : g(x) + Ay > a with : x £ D,y >0 

A master problem considers only a subset of variables x (often integer vari- 
ables, D is a discrete domain). A subproblem (SP) tries to complete the assign- 
ment on y and produces a Benders cut added to the master problem. This cut 
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has the form z > h(x) and constitutes the key point of the method, it is inferred 
by the dual of the subproblem. Let us consider an assignment x* given by the 
master, the subproblem (SP) and its dual (DSP) can be written as follows: 

SP : Min cy DSP : Max u(a — g(x*)) 

s.t Ay > a — g(x*) with : y > 0 s.t uA < c with : u > 0 

Duality theory ensures that cy > u(a — g(x*)). As feasibility of the dual 
is independent of x* , cy > u(a — g(x)) and the following inequality is valid: 
f{x) + cy > /( x) + u(a — g(x)). Moreover, according to duality, the optimal 
value of u* maximizing u{a — g{ x*)) corresponds to the same optimal value of 
cy. Even if the cut is derived from a particular x* , it is valid for all x and 
excludes a large class of assignments which share common characteristics that 
make them inconsistent. The number of solutions to explore is reduced and the 
master problem can be written at the I th iteration: 

PM : Min z 

s.t : z > f(x) +u*(a — g(x)) \/i < I 

From all of this, it can be noticed that dual variables need to be defined to 
apply the decomposition. However, [8] proposes to overcome this limit and to 
enlarge the classical notion of dual by introducing an inference dual available 
for all kinds of subproblems. He refers to a more general scheme and suggests a 
different way of thinking about duality: a Benders decomposition based on logic. 
Duality now means to be able to produce a proof, the logical proof of optimality 
of the subproblem and the correctness of inferred cuts. In the original Benders 
decomposition, this proof is established thanks to duality theorems. 

For a discrete satisfaction problem, the resolution of the dual consists in com- 
puting the infeasibility proof of the subproblem and determining under what 
conditions the proof remains valid. It therefore infers valid cuts. 

The success of the decomposition depends on both the degree to which decompo- 
sition can exploit structures and the quality of the cuts inferred. [8] suggests to 
identify classes of structured problems that exhibit useful characteristics for the 
Benders decomposition. Off-line scheduling problems fall into such classes and 
[10] demonstrates the efficiency of such an approach on a scheduling problem 
with dissimilar parallel machines. 

Our approach is strongly connected to Benders decomposition and the related 
concepts. It is inspired from methods used to integrate constraint programming 
into a Benders scheme [21,3]. The allocation and ressource problem will be 
considered on one side and schedulability on the other side. The subproblem 
checks the schedulability of an allocation, finds out why it is unschedulable and 
design a set of constraints (both symbolic and arithmetic) which rule out all 
assignments that are unschedulable for the same reason. Our approach concurs 
therefore the Benders decomposition on this central element: the Benders cut. 
The proof proposed here is based on off-line analysis techniques from real-time 
scheduling. One might think that a fast analytic proof could not provide enough 
relevant information on the inconsistency. As the speed of convergence and the 
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success of the technique greatly depends on the quality of the cut, a conflict 
detection algorithm will be coupled with analytic techniques: QuickXplain [11]. 
Moreover, the master problem will be considered as a dynamic problem to avoid 
redundant computations as much as possible. 

4 Solving Strategy 

The solving process requires a tight cooperation between master and subprob- 
lem(s). Both problems share a common model introduced in the next section in 
order to easily exchange nogoods. They will be presented before examining the 
cooperation mechanisms and the incremental resolution of the master. 

4.1 Master Problem 

The master problem is solved using constraint programming techniques. The 
model is based on a redundant formulation using three kinds of variables: x, 
p, w. At first, let us consider n integer variables x (our decision variables) cor- 
responding to each task and representing the processor selected to process the 
task: \/i £ {l..n}, Xi £ [l..m]. Secondly, boolean variables y indicate the pres- 
ence of a task onto a processor: V* £ {l..n},Vp £ {l..m}, yi P £ {0, 1}. Finally, 
boolean variables w are introduced to express the fact that a pair of tasks ex- 
changing a message are located on the same processor or not: Vcy = (U,tj) £ 
C, Wij £ {0,1}. Integrity constraints ( channeling constraints) are used to en- 
force the consistency of the redundant model. Links between x, y and w are 
made using element constraints. One of the main objectives of the master prob- 
lem is to efficiently solve the assignment part. It handles two kinds of constraints: 
allocation and resources. 

— Residence (c/. eq (5)): it consists of forbidden values for x. A constraint 
is added for each forbidden processor p of tp. Xi ^ p 

— Co-residence (c/. eq (6)): £ (3 2 ,Xi = Xj 

— Exclusion (c/. eq (7)): alldifferent(xi\ti £ 7) 

— Memory capacity (c/. eq (2)): Vp £ {1 ..to}, Eie{i..„} Vip x rm < p p 

— Utilization factor (c/. eq (3)): Let lcm(T) be the least common multiple 
of periods of the tasks. The constraint can be written as follows: 

Vp £ {l..m}, lcm(T) x WCETi x yi p /Ti < lcm(T) 

— Network use (c/. eq (4)): The network capacity is bounded by <5. There- 
fore, the size of the set of messages carried on the network cannot exceed 
this limit: 

lcm{T) x dij x /Ti < lcm{T) x S 

ie{l..ra} 

Utilization factor and network use are reformulated with the 1cm of tasks 
periods because our constraint solver cannot currently handle constraints with 
real coefficients and integer variables. 
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Fig. 2. Illustration of a schedulability analysis. The task 1 4 does not meet its deadline. 
The sub-set {ti, 13,14} is identified to explain the unschedulability of the system. 
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4.2 Subproblem(s) 

An assignment provided by the master problem is a valid allocation of tasks. 
The problem is here to rule on its schedulability to determine why it may be 
unschedulable. 

Independent Tasks. The first schedulability analysis has been initiated by Liu 
and Layland [15] for mono-processor real-time systems with independent and 
fixed priority tasks. The analysis consists in computing for each task ti its worst 
response time, WCRTi. The aim is to build the worst execution scenario which 
penalizes as much as possible the execution of t,,. 

For independent tasks, it has been proved that the worst execution scenario 
for a task ti happens when all tasks with a higher priority are awoken simulta- 
neously (date d on Figure 2). The worst-case response time of ti is: 

WCRTi = WCETi+ J2 \WCRTi/Tj] WCETj (8) 

tj£hp(A,ti) 

hp(A , ti) corresponds to the set of tasks with a higher priority than 1, and located 
on the processor A(ti) for a given allocation A. WCRTi is easily obtained by 
looking for the fix-point of equation (8). Then, it is sufficient to compare for each 
task its worst case response time with its deadline Di to know if the system is 
schedulable. In this case, the deadline of a task is equal to its period (Di = Ti). 

Communicating Tasks on a Token Ring. The result computed by a task must be 
made available before its next period to ensure regular data refreshment between 
tasks. The messages must reach their destination within the time allowed. With 
the token ring protocol, the maximum delay of transmission on the network 
is bounded and the TRT is proved to be an upper bound. This duration is 
computed by taking into account all the messages to be sent on the network: 

di t / 5 

{cij — (ti , t j ) I 

A(U) *A(ti)} 



TRT = 



(9) 
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The deadline for tasks sending data to non co-located tasks becomes Di = 
Tj — TRT. A sufficient condition of scheduling is written: 



4.3 Cooperation Between Master and Subproblem(s) 

A complete or partial assignment of variables x, y, w will be now considered. The 
key point is to find an accurate explanation that encompasses all values of x for 
which the infeasibility proof (obtained for particular values of x) remains valid. 
We know at least that the current assignment is contradictory, in other words, 
a nogood is identified. The links between the concept of nogood [20] introduced 
in constraint programming and the Benders cut are underlined in [9] . 

Independent Tasks, m independent subproblems for each processor are solved. 
The schedulability of a processor k is established by applying equation (8) to 
each task tj located on k ( Xi = k) in a descendent order of priority until a contra- 
diction occurs. For instance, in Figure 2, the set (fi, t2, f3, t^) is unschedulable. It 
explains the inconsistency but is not minimal. However the set (ti, £3, t 4 ) is suffi- 
cient to justify the contradiction. In order to compute more precise explanations 
[i.e. achieve a more relevant learning), a conflict algorithm, QuickXplain [11], 
has been used to determine the minimal involved set of tasks ( w.r.t . inclusion). 
The propagation algorithm considered here is equation (8). Tasks are added from 
ti until a contradiction occurs on t c , the last added task t c belongs to the mini- 
mal conflict c. The algorithm re-starts by initially adding the tasks involved in 
c. When c is inconsistent, it represents the minimal conflict among the initial 
set (fi, . . . , f c ). The subset of tasks T C T corresponds to a NotAllEqual 1 on x: 



It is worth noting that the constraint could be expressed as a linear combina- 
tion of variables y. However, NotAllEqual(x\ ,x^ ,X 4 ) excludes the solutions that 
contain the tasks 1,2,3 gathered on any processor. 

Communicating Tasks on a Token Ring. The difficulty is to avoid incriminating 
the whole system: 

1. At first, the network is simply not considered. If a processor is unschedulable 
without taking additional latency times due to the exchange of messages, it 
is still true in the general case. We can again infer: N ot All Equal (xi\U £ T). 

2. Secondly, we only consider the network. When the sending tasks have a 
period less than TRT , the token does not come back early enough to allow 

1 A NotAllEqual on a set V of variables ensures that at least two variables among V 
take distinct values. 






tj£hp(A,ti) 



Not All Equal (Xi\U £ T) 
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the end of their execution. In this case, equation (10) will never be satisfied. 
A set of inconsistent messages M C C is obtained: 

E Wi i < l M l 

cij e m 

3. The last test consists in checking equation (10). A failure returns a set T CT 
of tasks which is inconsistent with a set of messages M C C. It corresponds to 
a nogood. We use a specific constraint to take advantage of symmetries and 
to forbid this assignment as well as permutations of tasks among processors. 
It can be written as a disjunction between the two previous cuts: 

nogood{xi\ti £ £ M) = 

N ot All Equal {xi\ti £ T) VE Wij < \M\ 

gm 



QuickXplain has been used again to refine information given in point 2 and 
3. Let us now continue with the question of how information learnt from the 
previous failures can be integrated efficiently ? [21] outlines this problem and 
notices a possible significant overhead with redundant calculations. To address 
this issue, we considered the master problem as a dynamic problem. 

Incremental Resolution. Solving dynamic constraint problems has led to dif- 
ferent approaches. Two main classes of methods can be distinguished: proactive 
and reactive methods. On the one hand, proactive methods propose to build ro- 
bust solutions that remain solutions even if changes occur. On the other hand, re- 
active methods try to reuse as much as possible previous reasonings and solutions 
found in the past. They avoid restarting from scratch and can be seen as a form 
of learning. One of the main methods currently used to perform such learning is 
a justification technique that keeps trace of inferences made by the solver dur- 
ing the search. Such an extension of constraint programming has been recently 
introduced [12]: explanation-based constraint programming (e- constraints). 

Definition 1 An explanation records information to justify a decision of the 
solver as a reduction of domain or a contradiction. It is made of a set of con- 
straints C' (a subset of the original constraints of the problem) and a set of 
decisions dc\, ..., dc n taken during search. An explanation of the removal of 
value a from variable v will be written: C' A dc\ A dc 2 A • • • A dc n => v yf a. 

When a domain is emptied, a contradiction is identified. An explanation for 
this contradiction is computed by uniting each explanation of each removal of 
value of the variable concerned. At this point, dynamic backtracking algorithms 
that only question a relevant decision appearing in the conflict are conceivable. 
By keeping in memory a relevant part of the explanations involved in conflicts, 
a learning mechanism can be implemented [13]. 

Here, explanations allow us to perform an incremental resolution of the mas- 
ter problem. At each iteration, the constraints added by the subproblem generate 
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a contradiction. Instead of backtracking to the last choice point as usual, the cur- 
rent solution of the master problem is repaired by removing the decisions that 
occur in the contradiction as done by the MAC-DBT algorithm [12]. Tasks as- 
signed at the beginning of the search can be moved without disturbing the whole 
allocation. In addition, the model reinforcement phase tries to transform a learnt 
set of elementary constraints that have been added at previous iterations into 
higher level constraints. Explanations offer facilities to easily dynamically add 
or remove a constraint from the constraint network [12]. 

Notice that the master problem is never re-started. It is solved only once but 
is gradually repaired using the dynamic abilities of the explanation-based solver. 

Model Reinforcement. Pattern recognition among a set of constraints that 
expresses specific subproblems is a critical aspect of the modelisation step. Con- 
straint learning deals with the problem of automatically recognizing such pat- 
terns. We would like to perform a similar process in order to extract global 
constraints among a set of elementary constraints. For instance, a set of differ- 
ence constraints can be formulated as an all-different constraint by looking for 
a maximal clique in the induced constraint graph. It is a well-known issue to 
this question in constraint programming and a version of the Bron/Kerbosh al- 
gorithm [5] has been implemented to this end (difference constraints occur when 
N otAllEquals involve only two tasks). In a similar way, a set of NotAUEqual 
constraints can be expressed by a global cardinality constraint (gcc) [18]. It cor- 
responds now to a maximal clique in a hypergraph (where hyperarcs between 
tasks are Not All Equals). However, it is still for us an open question that could 
significantly improve performances. 

5 First Experimental Results 

For the allocation problem, specific benchmarks are not provided in real-time 
scheduling. Experiments are usually done on didactic examples [22, 1] or ran- 
domly generated configurations [17,16]. We opted for this last solution. Our 
generator takes several parameters into account: 

— n, to, mes: the number of tasks, processors (experiments have been done on 
a fixed size: n = 40 and to = 7) and messages; 

— Vo global- the global utilization factor of processors; 

— %mem' the over-capacity memory, i.e. the amount of additionnal memory 
avalaible on processors with respect to the memory needs of all tasks; 

— % res : the percentage of tasks included in residence constraints; 

— % co -res'- the percentage of tasks included in co-residence constraints; 

— %exc : the percentage of tasks included in exclusion constraints; 

— %msize ■ the size of a message is evaluated as a percentage of the period of 
the tasks exchanging it. 

Task periods and priorities are randomly generated. However, worst-case execu- 
tion time are initially randomly chosen and evaluated again to respect: 
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S"=i WCETi/Ti = m% g iobal ■ The memory need of a task is proportional to 
its worst-case execution time. Memory capacities are randomly generated but 
must satisfy: Mfc = (1 + % mem 

The number of tasks involved in allocation constraints is given by the pa- 
rameters %res , %co-resi %exc- Tasks are randomly chosen and their number 
(involved in co-residence and exclusion constraints) can be set through specific 
levels. Several classes of problems have been defined depending on the difficulty 
of both allocation and schedulability problems. The difficulty of schedulabil- 
ity is evaluated using the global usage factor % g i 0 bai which varies from 40 to 
90 %. Allocation difficulty is based on the number of tasks included in resi- 
dence, co-residence and exclusion constraints (% re s, %co-res, %exc )• Moreover, 
the memory over-capacity, % mem has a significant impact (a very low capacity 
can lead to solve a packing problem, sometimes very difficult) . The presence of 
messages impacts on both problems and the difficulty has been characterized 
by the ratios mes/n and % m size ■ As we consider precedence chains, we can not 
have more than one message per task and the ratio mes/n is always less than 
1. %msize reflects the impact of messages on schedulability analysis by linking 
periods and message sizes. 

The table 1 describes the parameters and difficulty class of the considered 
problems. For instance, a class 2-1-4 indicates a problem with an allocation 
difficulty in class 2, a schedulability difficulty in class 1 and a network difficulty 
in class 4. 



Table 1. Details on classes of difficulty. 



Alloc. 


%raem %res 


*%co — res 


%)exc 




Sched. 


%)global 


Mes. 


mes/n 


%msize 


1 


80 


0 


0 


0 




1 


40 


1 


0.5 


40 


2 


40 


15 


15 


15 




2 


60 


2 


0.5 


70 


3 


30 


25 


25 


25 




3 


75 


3 


0.75 


70 


4 


15 


35 


35 


35 




4 


90 


4 


0.875 


150 



5.1 Independent Tasks 

Table 2 summarizes the results of our experiments. Iter is the number of it- 
erations between master and subproblems, NotAllEq and Diff are the number 
of NotAllEqual and difference constraints inferred. CPU is the resolution time 
in seconds and Xplain expresses if the QuickXplain algorithm has been used. 
Finally % Success gives the number of instances successfully solved (a schedula- 
ble solution has been found or the proof of inconsistency has been done) within 
the time limit of 10 minutes per instance. The data are obtained in average (on 
instances solved within the required time) on 100 instances per class of difficulty 
with a pentium 4, 3 GigaHz and the Java version of PaLM [12]. 

The class 1-4 represents the hardest class of problem. Without the allocation 
problem, the initial search space is complete and everything has to be learnt. 
Moreover, these problems are close to inconsistency due to the hardness of the 
schedulability. Limits of our approach seem to be reached in such a case without 
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Table 2. Average results on 100 instances randomly generated into classes of problems. 



Cat ( Alloc/Sched) 


Xplain 


Iter NotAllEq Diff CPU (s) % Success 


1-1 


N 


46,35 


91,29 4,45 


0,58 


100% 


1-1 


Y 


10,59 


39,79 12,41 


0,28 


100% 


1-2 


Y 


26,75 


96,93 28,50 


3,46 


99% 


1-3 


Y 


65,23 


213,87 39,21 


28,70 


94% 


1-4 


Y 100,88 


373,08 57,82 


93,40 


40% 


2-2 


Y 


46,00 


168,27 23,13 


34,51 


91% 


2-3 


Y 


58,89 


233,63 37,06 


71,18 


81% 


3-4 


Y 


138,29 


131,22 40,65 


62,12 


91% 



an efficient re-modeling of NotAUEquals constraints into gcc (see 4.3). The cuts 
generated seem actually quite efficient. A relevant learning can be made in the 
case of independent tasks by solving m independent subproblems. Of course, if 
the symetry of the processors does not hold, this could be questionnable. 

The execution of a particular and hard instance of class 2-3 is outlined on Fig- 
ure 3. Resolution time and learnt constraints at each iteration are detailed. The 
master problem adapts the current solution to the cuts due to its dynamic abili- 
ties and the learning process is very quick at the beginning. The number of cuts 
decreases until a hard satisfaction problem is formulated (a-b in Fig. 3). The 
master is then forced to question a lot of choices to provide a valid allocation 
(&). The process starts again with a quick learning of nogoods ( b-c , c-d). 




Fig. 3. Execution of a hard instance of class 2-3. Resolution time and a floating average 
of step 10 of the number of cuts (in dotlines) inferred at each iteration are shown. (310 
iterations, 1192 NotAUEqual, 75 differences partially re-modeled into 12 alldifferent) . 



5.2 Communicating Tasks on a Token Ring 

We chose to experiment the technique on a well-known instance of real-time 
scheduling: the Tindell instance [22], solved thanks to simulated annealing. This 
instance exhibits a particular structure: the network plays a critical part and 
feasible solutions have a network utilization almost minimal. We were forced to 
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Table 3. Average results on 100 instances randomly generated into classes of problems. 



Cat (A/S/M) 


Iter NotAllEq Diff NetCuts Nogoods CPU (s) %Succ 


2-1-1 


34,7 


47,7 24 


23,5 


8,6 


24,7 


98% 


2-1-2 


40,1 


56,9 25,4 


36,2 


8,4 


18,6 


93% 


2-1-3 


91,9 


64,2 23,5 


134,3 


27,2 


106,6 


56% 


2-2-1 


58,9 


118,4 47,5 


11,2 


2,7 


72,7 


82% 


2-2-2 


55,3 


116,5 46,9 


45,2 


9,2 


60,5 


74% 


2-2-3 


77,6 


97,3 39,1 


96,2 


43,1 


142,1 


38% 



specialize our generic approach on this particular point through the use of an 
allocation heuristic that try to gather tasks exchanging messages. One can obtain 
the solution of Tindell very quickly (less than 10 seconds) if minimizing the 
network at each iteration. Moreover, we experimented our approach on random 
problems involving messages: 

One can see on the table 3 that when several hardness aspects compete on 
the problem, the difficulty increases (2-2-3 compared to 1-1-3). The presence of 
messages make the problem much more complex for our approach because inde- 
pendency of subproblems (a key point of Benders) is lost and the network cut is a 
weak one. Determining what tasks should be or not together becomes a difficult 
question when a tigth overall memory is combined to a difficult schedulability 
and a lot of medium size messages. However, simple heuristics approachs have 
received a lot of attention from the real-time community and could be used to 
guide the search efficiently in CP. We hope to achieve better results with a more 
efficient heuristic inspired from the best one designed in real-time systems and 
coupled with the learnt information of the cuts. More experiments have to be 
carried out to clearly establish the difficulty frontier. 



6 Discussion on the Approach 

Our approach tries to use logic based Benders as a mean of generating rele- 
vant nogoods. It is not far from the hybrid framework Branch and Check of [21] 
which consists in checking the feasibility of a delayed part of the problem in a 
subproblem. In our case, the schedulability problem is gradually converted into 
the assignment problematic. The idea is that the first problem could be dealt 
with efficiently with constraint programming, and especially, with an efficient 
re-modeling process. In addition, it avoids thrashing on schedulability inconsis- 
tencies. As with explanation based algorithms (MAC-DBT or Decision-repair 
[13]), it tries to learn from its mistakes. 

The technique is actually complete but it could be interesting to relax its 
completeness (from this point, we step back from Benders). One current prob- 
lem is the overload of the propagation mechanism because of the accumulation 
of low power filtering constraints. We could use a tabu list of benders cuts and 
decide to keep permanently in memory the most accurate nogoods or only those 
contributing to a stronger model (a fine management of memory can be imple- 
mented due to dynamic abilities of the master problem) . 
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One could also think building a filtering algorithm on equation (8). However, 
the objective is to show how precise nogoods could be used and to validate an 
approach we intend to implement on complex scheduling models. As analysis 
techniques quickly become very complex, a contradiction raised by a constraint 
encapsulating such an analysis seems to be less relevant than a precise explana- 
tion of failure. 

The idea is to take advantage of the know-how of real-time scheduling com- 
munity in a decomposition scheme such as the Benders one where constraint 
programming could efficiently solve the allocation problem. 

7 Conclusion and Future Work 

We propose in this paper, a decomposition method built to a certain extent on 
a logic Benders decomposition as a way of generating nogoods. It implements a 
logical duality to infer nogoods, tries to enforce the constraint model and finally 
performs an incremental resolution of the master problem. It is also strongly re- 
lated to a class of algorithms which intends to learn from mistakes in a systematic 
way by managing nogoods. 

For independent tasks, the use of QuickXplain is critical to speed up the 
convergence but the limits seem to be reached for highly constrained and incon- 
sistent problems. Nevertheless, we believe that the difficulty can be overcome 
through an efficient re-modeling process. The use of an efficient heuristic to guide 
the CP search is needed on communicating tasks when several hardness aspect 
compete on the problem. As lot of traditionnal approaches in real time systems 
are based on heuristics, we hope to benefit from them and more experiments 
have to be carried out on this point. 

Our next step would be to compare our approach with other methods such as 
traditional constraint and linear programming. We believe it should be also in- 
teresting to extend our study to other kinds of network protocols (CAN, TDMA, 
etc.) and precedence constraints. Moreover, another kind of constraints some- 
times occur: disjunction between set of tasks. The disjunction global constraint 
has not been studied a lot and it could provide accurate modeling and solving 
tools to tackle the assignment problem with more complex allocation constraints. 

Our approach gives a new answer to the problematic of real-time task alloca- 
tion. It opens new perspectives on integrating techniques coming from a broader 
horizon than optimization, within CP in a Benders scheme. 
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Abstract. The quantified constraint satisfaction problem (QCSP) is a 
natural and useful generalization of the constraint satisfaction problem 
(CSP) in which both universal and existential quantification of variables 
is permitted. Because the CSP and QCSP are in general intractable, 
much effort has been directed towards identifying restricted cases of 
these problems that are tractable in polynomial time. In this paper, 
we investigate restricted cases of the QCSP having 2-semilattice poly- 
morphisms. We prove a complete classification of 2-semilattice polymor- 
phisms, demonstrating that each gives rise to a case of the QCSP that 
is either tractable in polynomial time, or coNP-hard. 



1 Introduction 

The constraint satisfaction problem (CSP) is widely acknowledged as a conve- 
nient framework for modelling search problems. An instance of the CSP consists 
of a set of variables, a domain, and a set of constraints; each constraint consists 
of a tuple of variables paired with a relation (over the domain) which contains 
permitted values for the variable tuple. The question is to decide whether or not 
there is an assignment mapping each variable to a domain element that satisfies 
all of the constraints. 

All of the variables in a CSP can be thought of as being implicitly existen- 
tially quantified. A natural generalization of the CSP is the quantified constraint 
satisfaction problem (QCSP), where variables may be both existentially and uni- 
versally quantified. Whereas the CSP concerns deciding the existence of a static 
object, a satisfying assignment, the QCSP concerns deciding the existence of a 
dynamic object: a strategy telling how to set the existentially quantified vari- 
ables in reaction to an arbitrary setting of the universally quantified variables, so 
that the constraints are satisfied. The generality of the QCSP framework permits 
the modelling of a variety of artificial intelligence problems that cannot be ex- 
pressed using the CSP, for instance, problems from the areas of planning, game 
playing, and non-monotonic reasoning. Of course, the relatively higher expres- 
siveness of the QCSP comes at the price of higher complexity: whereas the CSP 
is in general NP-complete, the QCSP is in general complete for the complexity 
class PSPACE, which is believed to be much larger than NP. 
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The general intractability of the CSP and QCSP motivates the search for 
cases of these problems that are tractable in polynomial time. A particularly 
useful way to restrict the CSP and QCSP in order to obtain tractable cases 
is to restrict the types of relations that may appear in constraints. Formally, 
this is done by defining a constraint language to be a set of relations and then 
defining, for each constraint language T, the problem CSP(T) (QCSP(T)) to 
be the restricted version of the CSP (QCSP) where only relations from the set 
r may be present. This form of restriction can capture and place into a uni- 
fied framework many particular cases of the CSP that have been independently 
investigated, such as Horn Satisfiability and 2-Satisfiability, as well as 
their corresponding QCSP generalizations, Quantified Horn Satisfiability 
and Quantified 2-Satisfiability. 

The class of problems CSP(T) was first considered by Schaefer; he proved 
a now classic classification theorem which states that for every constraint lan- 
guage r over a two-element domain, the problem CSP(T) is either in P or is 
NP-complete [25]. The non-trivial tractable cases of CSP(T) given by this result 
are the Horn Satisfiability, 2- Satisfiability, and XOR-Satisfiability 
problems. Over the past decade, many more complexity classification theorems 
in the spirit of Schaefer’s have been established for different variants and gen- 
eralizations of the CSP (see for example the book [15]), including a classifi- 
cation theorem for the problems QCSP(T) in domain size two [15,16]. This 
classification theorem demonstrates that, when r is a constraint language over 
a two-element domain, the only tractable problems of the form QCSP(T) are 
Quantified Horn Satisfiability [12], Quantified 2-Satisfiability [1], 
and Quantified XOR-Satisfiability [15], reflecting exactly the non-trivial 
tractable constraint languages provided by Schaefer’s theorem; for all other con- 
straint languages T, the problem QCSP(T') is PSPACE-complete. 

In recent years, much effort has been directed towards the program of classify- 
ing the complexity of CSP(T) for all constraint languages T over a finite domain 
of arbitrary size. While this appears to be a particularly challenging research 
problem, impressive progress has been made, including the papers [18, 21, 19, 20, 
17, 23, 22, 8, 4-6, 9, 3] . Many of these papers make use of an intimate connection 
between CSP complexity and universal algebra that has been developed [21, 19, 
8] . The central notion used to establish this connection is that of polymorphism ; 
a constraint language has an operation as polymorphism, roughly speaking, if 
each relation of the constraint language satisfies a certain closure property de- 
fined in terms of the operation. There are many results in the literature which 
demonstrate that if a constraint language T has a polymorphism of a certain 
type, then the problem CSP(T) is tractable. 

Very recently, the study of QCSP complexity based on constraint languages 
in domains of arbitrary size was initiated [2, 13, 14]. It has been shown that the 
same polymorphism-based algebraic approach used to study CSP complexity 
can also be used to study QCSP complexity [2]. In the papers [2, 13, 14], general 
sufficient conditions for QCSP tractability have been identified, which, as with 
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many of the CSP tractability results, demonstrate that the presence of a certain 
type of polymorphism guarantees tractability of a constraint language. 

In this paper, we continue the study of QCSP complexity by investigat- 
ing constraint languages that have a 2-semilattice operation as polymorphism. 
A 2-semilattice operation is a binary operation * that satisfies the semilattice 
identities restricted to two variables, namely, the identities x * x = x (idempo- 
tence), x * y = y * x (commutativity), and (x * x) * y = x * (x * y) (restricted 
associativity). 2-semilattices constitute a natural generalization of semilattices, 
one of the initial classes of polymorphisms shown to guarantee CSP tractability 
[21], and have been shown to imply CSP tractability via a consistency-based 
algorithm [3]. We prove a full classification of 2-semilattice polymorphisms for 
QCSPs, showing that some such polymorphisms guarantee tractability, while 
others do not. We would like to highlight three reasons as to why we believe our 
study of 2-semilattice polymorphisms in the QCSP setting is interesting. 

First, as pointed out previously in [3], 2-semilattice polymorphisms play an 
important role in the investigation of maximal constraint languages, which are 
constraint languages that can express any relation when augmented with any 
relation not expressible by the language. Because a constraint language that 
can express all relations is intractable, maximal constraint languages are the 
largest constraint languages that could possibly be tractable (in either the CSP 
or QCSP setting); hence, studying maximal constraint languages allows one to 
obtain the most general tractability results possible. (It is worth noting here that 
all of the tractability results identified by Schaefer’s theorem apply to maximal 
constraint languages.) It follows from a theorem of Rosenberg [24] that maximal 
constraint languages can be classified into five types; for four of these types of 
constraint languages, QCSP tractability or QCSP intractability can be derived 
using established results. For the remaining type - constraint languages having 
a non-projection binary idempotent operation as polymorphism - a complexity 
classification has not yet been established. The present work constitutes a step 
towards understanding this remaining type. We mention that in the CSP setting, 
the tractability of 2-semilattices has been leveraged to give complete complexity 
classifications of maximal constraint languages for domains of size three and four 
[7,3]. 

Second, our tractability proofs make use of and validate new machinery for 
proving QCSP tractability that was developed in [14]. In particular, a key idea 
from [14] that we make use of here is that of collapsibility, roughly speaking, 
a problem QCSP(C) is j-collapsible if any problem instance can be reduced to 
deciding the truth of a conjunction of QCSPs, each of which has a constant 
number of universal quantifiers and is derived from the original instance by 
collapsing together universally quantified variables. In [14] , it was demonstrated 
that many constraint languages T are such that (1) CSP(-T) is tractable and (2) 
the problem QCSP(-T) is j-collapsible; it was also demonstrated that these two 
properties can be used together to derive the tractability of QCSP(T). We give 
another class of constraint languages having these properties, by showing that 
QCSP problems having tractable 2-semilattice polymorphisms are 1-collapsible. 
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This provides further evidence that collapsibility is a fruitful and useful tool for 
studying QCSP complexity. Moreover, we believe our proof of 1-collapsibility to 
be the most non-trivial collapsibility proof to date. 

Third, although all 2-semilattice polymorphisms are tractable in the CSP 
setting, 2-semilattice polymorphisms intriguingly yield two modes of behavior 
in the QCSP setting: some 2-semilattice polymorphisms guarantee tractability, 
while others do not. This is surprising in light of the fact that for all other types 
of polymorphisms (of non-trivial constraint languages) that have been inves- 
tigated, polymorphisms that guarantee CSP tractability also guarantee QCSP 
tractability. In fact, our results imply the first and only known examples of 
non-trivial constraint languages that are CSP tractable, but QCSP intractable 1 . 
The existence of such constraint languages implies that the boundary between 
tractability and intractability in the QCSP context is genuinely different from 
the corresponding boundary in the CSP context. 

The contents of this paper are as follows. We present the basic terminology 
and concepts to be used throughout the paper in a preliminaries section (Sec- 
tion 2). We prove a classification theorem which shows that every 2-semilattice 
polymorphism gives rise to a case of the QCSP that is either tractable in poly- 
nomial time, or is coNP-hard; and, we derive some consequences of this the- 
orem (Section 3). We then demonstrate, for the QCSPs having a tractable 2- 
semilattice polymorphism, a result significantly stronger than mere polynomial 
time tractability: we show that such QCSPs are 1-collapsible, the strongest pos- 
sible statement one can show concerning collapsibility (Section 4). 



2 Preliminaries 

We use [n] to denote the set containing the first n positive integers, that is, 
{1, . . . , n}. 

2.1 Quantified Constraint Satisfaction 

Quantified formulas. A domain is a nonempty set of finite size. A tuple (over 
domain D) is an element of D k for some k > 1, and is said to have arity k. The 
ith coordinate of a tuple t is denoted by fj. A relation (over domain D) is a 
subset of D k for some k > 1, and is said to have arity k. 

A constraint is an expression of the form R(v), where R is a relation and v is 
a tuple of variables such that R and v have the same arity. A constraint network 
is a finite set of constraints, all of which have relation over the same domain, and 
is said to be over the variable set V if all of its constraints have variables from 
V . Throughout, we let D denote the domain of our constraints and constraint 
networks. 

1 Here, by a non-trivial constraint language, we mean a constraint language that in- 
cludes each constant as a relation. 
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Definition 1. A quantified formula is an expression of the form Q\V\ . . . Q n v n C, 
where each Qi is a quantifier from the set {V, 3}, and C is a constraint network 
over the variable set {iq, . . . , v n }. 

The quantified formula Q\V\ . . . Q n VnC is said to have Q\V\ . . . Q n v n as its 
quantifier prefix. We say that the variable Vi comes (strictly) before the variable 
Vj if * < j (i < j). We let V^, Y^, and X ^ denote the variables, universally 
quantified variables, and existentially quantified variables of a quantified formula 
<f>, respectively; we drop the <f> subscript when it is understood from the context. 
We generally assume that the universally quantified variables (of a quantified 
formula) are denoted J/i, . ■ • ,J/m, where y t comes strictly before y 3 for i < j; 
similarly, we generally assume that the existentially quantified variables (of a 
quantified formula) are denoted Xi, . . . , x\x\, where Xi comes strictly before x 3 
for i < j. When W is a non-empty subset of the variable set (of a quantified 
formula <f>), we use first^W) to denote the unique variable in W coming before 
all of the other variables in W. For a subset W of the variable set V (of a 
quantified formula) and a variable v of V, we let W[< v\ ( W[< v]) denote the 
set of variables in W coming (strictly) before v. 

Strategies and truth. A constraint R{v i, . . . , Vk) is satisfied by an assignment 
/ defined on {iq, . . . , Vk} if (f(v i), . . . , f(vk)) G R. A constraint network C (over 
the variable set V) is satisfied by an assignment / : V — > D if each constraint C 
in C is satisfied by /. 

Definition 2. A strategy a is a sequence of mappings 

Wi ■ D} ie[n] 

where the ith mapping cq is a function over D having rank rank(cq) > 0. 

Note that when cq is a mapping of a strategy such that rank(cq) = 0, we 
consider cq to be a constant, that is, an element of D. 

A strategy for the quantified formula <j> is a strategy ay, . . . , cr|^| where for 
i € [pT*|], the mapping ct, has rank |y^[< xf\\. An adversary for the quan- 
tified formula f> is a function r : — »■ D. When a is a strategy and r is 

an adversary for the quantified formula <f>, the outcome of a and r, denoted 
by outcome(cq r) : — > D, is the assignment defined by outcome(cr, r)(x*) = 

cq(r(i/i), . . ■ iT{y\Y[< Xi \\)) for Xi G and outcome(cq r)(yi) = r(y l ) for y 3 G Y^. 

A strategy a for the quantified formula <f> is said to be winning if for all ad- 
versaries r for the assignment outcome((7, t) : V# — > D satisfies the constraint 
network C of f>. We consider a quantified formula <f> to be true if there exists a 
winning strategy for <f>. (This is one of many equivalent ways to define truth of 
a quantified formula.) 

Problem formulation. In this paper, we focus on restricted versions of the 
QCSP where all relations must come from a constraint language. A constraint 
language is defined to be a set of relations (not necessarily of the same arity), 
all of which are over the same domain. 
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Definition 3. Let T be a constraint language. The QCSP(r') problem is to de- 
cide, given as input a quantified formula with constraints having relations from 
T, whether or not f> is true. 

We define the CSP (T) problem to be the restriction of the QCSP(-T) problem 
to instances where all quantifiers are existential. 

Polymorphisms. A powerful algebraic theory for studying the complexity of 
CSP(-T) problems was introduced in [21, 19]; it can also be applied to study the 
complexity of QCSP (T) problems [2]. (We refer the reader to those papers for 
more information.) An operation p : D k — » D is a polymorphism of a relation 
R C D m if for all tuples f i , . . . ,tk £ R, the tuple 

(h'i.t 11 7 • • • ? f fcl ) 7 • • • , P’iflm 7 • • • , tkm )) 

is in R. An operation p : D k — > D is a polymorphism of a constraint language 
r if p is a polymorphism of all relations R £ T. It has been shown that the 
complexity of CSP (T) (and QCSP(-T)) is tightly connected to the polymorphisms 
of T ; in particular, if two finite constraint languages F \ , A share exactly the same 
polymorphisms, then CSP(Pi) and CSP(/ 2 ) are reducible to each other via many- 
one polynomial time reductions (and likewise for QCSP(A) and QCSP(/ 2 )). 
When p : D k — * D is an operation, we use lnv(/z) to denote the set of all 
relations having p as polymorphism. 

Definition 4. Let p : D k — > D be an operation. The QCSP(/z) problem is to 
decide, given as input a quantified formula <f> with constraints having relations 
from lnv(/x), whether or not <j> is true. 

In other words, the QCSP (p) problem is that of deciding the truth of a 
quantified formula where all of the relations have p as a polymorphism. 

2.2 Collapsibility 

In this paper, we will make use of machinery developed in [14] (on which the 
material in this subsection is based) to prove tractability results for the QCSP. 

Collapsings of formulas. Define a quantified formula <j>' to be a j -collapsing 
of a quantified formula (f> if there exists a subset Y' of Y), such that \Y'\ = 
min(j, |Y^,|) and ft can be obtained from f> by first eliminating, from the quanti- 
fier prefix of q f all variables in Y^\Y' except for first^Y^W); and then replacing, 
in the constraint network of <j>, all variables in Y^ \ Y' with first^Y^, \ Y'). For 
example, the quantified formula 

Vyi 3a;i' \/t/ 2 Vt/ 3 3a:2 { Ri (yi , x x ) , R 2 (y 3 , x 2 ) , R 3 ( 2 / 2 , 2 / 3 , x 2 ) } 

has the following three 1-collapsings: 



Vyi3x 1 Vy 2 3x 2 {Ri(yi, Xi), R 2 (v 2 , x 2 ), R3(y2, V 2 , x 2 )j 
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Vj/i 3xi Vj/ 2 3x 2 {i?i (j/i , xi ) , i? 2 (j/i , X 2 ) , i?3 ( 2 / 2 , j/i , X 2 ) } 
V2/i3xiV2/ 3 3x 2 {J?i(2/i, xi), ^ 2 ( 2 / 3 , x 2 ), -R 3 ( 2 / 1 , 2 / 3 , z 2 )}. 

Note that if a quantified formula f> is true, then for any j > 1 and any j-collapsing 
dj of <j>, the quantified formula <p' is true. 

Define a set of functions F of the form r : Y — > D to be a j -adversary set for 
Y if there exists a subset Y' C Y of size min(j. |F|) such that F contains exactly 
those functions r having the property that r(yi) = r(y 2 ) for all y\ . y 2 £ Y\ Y'. 
When (j) is a quantified formula and F is a set of adversaries r : Y^ — > D, we say 
that the formula f> is F-winnable if there exists a strategy a for <f> such that for 
all t £ F, the assignment outcome(<r, r) satisfies the constraint network of (j> . As 
an alternative to saying that the formula (j) is F-winnable, we will say that F is 
winnable when <f> is understood from the context. 

It is straightforward to verify that the j-collapsing of a quantified formula 
corresponding to a subset Y' is true, if and only if the formula is P-winnable, 
where F is the j-adversary set corresponding to Y' . Hence, we have the following 
proposition. 

Proposition 5. Let <j> be a quantified formula. The j -collapsings <j/ of <f> are all 
true if and only if for each j -adversary set F for Y^, the formula <f> is F -winnable. 

Collapsibility of problems. We say that a problem of the form QCSPl/f j is 
j-collapsible if for every quantified formula f> that is an instance of QCSP(/x), the 
following property holds: if all j-collapsings of <j> are true, then <f> is true. We say 
that a problem QCSP(/z) has bounded collapsibility if there exists a j > 1 such 
that QCSP(/t) is j-collapsible. The following notion of composability is useful for 
identifying problems of the form QCSP(/i) that have bounded collapsibility. 

When p : D k — > D is a function, and F, F\,...,Fj. are sets of functions 
of the form r : { 2 / 1 , . . . ,y n } D, we say that F is p-composable in one step 
from Fi,...,Fk if there exist strategies n 1 = {nj : D l — > D} ie [ n ], . . . ,TT k = 
{n k : D 1 — » D} ie [ n ] such that for all t £ F, the following two properties hold: 
(1) for all i £ [fc], the mapping t 1 : { 2 / 1 , • • • , y n } —> D defined by r l (yj) = 
7T* (-r(2/i), . . • , r(yj)) (for all j £ [n]) is contained in F it and (2) it holds that 
T (Vj) = M( rl (2/j)> • • • , (2/y )) (f° r j G [n]). The key feature of this definition 

is the following lemma. 

Lemma 6. Let 4> be a quantified formula with { 2 / 1 , . . . , y n } as its universally 
quantified variables and with relations invariant under p, and suppose that F 
is p-composable in one step from Fi , . . . , F^. If <j> is Fi-winnable for all i £ [fc], 
then 4> is F-winnable. 

The condition of bounded collapsibility can be combined with CSP tractabil- 
ity to infer QCSP tractability results. Recall that an operation p : D k — > D is 
idempotent when p(d , . . . , d) = d for all d £ D. 

Theorem 7. Suppose that p is an idempotent operation such that QCSP(/i) 
has bounded collapsibility and CSP(/i) is decidable in polynomial time. Then, 
QCSP(/t) is decidable in polynomial time. 
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3 Classification of 2-Semilattices 

In this section, we present a complete complexity classification of 2-semilattice 
operations in quantified constraint satisfaction. We first formally state the clas- 
sification theorem (Theorem 8) and discuss some of its implications; then, we 
prove the theorem in two parts. 

3.1 Statement of Classification Theorem and Implications 

A 2-semilattice operation is a binary operation * : D 2 — » D such that for all 
x, y £ D it holds that xkx = x (idempotence), xky = y-kx (commutativity), and 
(x * x) k y = x * (x * y) (restricted associativity). Every 2-semilattice operation 
* : D 2 — > D induces a directed graph Q* = ( D,E ) with edge set E = {(a, b) £ 
D x D : akb = 6}. We use C* to denote the set of strongly connected components 
(or components, for short) of Q*, and let < be the binary relation on C* where 
for Ci, C 2 £ C*, it holds that Ci < C 2 if and only if there exist vertices v\ £ Ci, 
i >2 G C 2 such that there is a path (in Q*) from v\ to v- 2 - It is straightforward to 
verify that < is a partial order. We say that C G C* is a minimal component if 
it is minimal with respect to <, that is, for all C' G C* , C' < C implies C' = C. 

Our classification theorem demonstrates that 2-semilattice operations give 
rise to two modes of behavior in QCSP complexity, depending on the structure 
of the graph Q* . 

Theorem 8. Let k : D 2 — > D be a 2-semilattice operation. If there is a unique 
minimal component in C* , then QCSP(*) is decidable in polynomial time. Oth- 
erwise, QCSP(*) is coNP-hard. 

One implication of this classification theorem is a complete classification of 
semilattice operations in QCSP complexity. Recall that a semilattice operation 
is a binary operation that is associative, commutative, and idempotent. We say 
that a semilattice operation * : D 2 — > D has a unit element if there exists an 
element u G D such that for all d G D, d k u = u k d = d. 

Corollary 9. Letk : D 2 — > D be a semilattice operation. Ifk has a unit element, 
then QCSP(*) is decidable in polynomial time. Otherwise, QCSP(*) is coNP- 
hard. 

Proof. When * is a semilattice operation, it is straightforward to verify that each 
component in C * is of size one. Hence, C* has a unique minimal component if 
and only if * has a unit element. □ 

We note that the tractability of semilattice operations with unit has been 
previously derived [14] . 

Another implication of our classification theorem is the tractability of all 
commutative conservative operations. A commutative conservative operation is 
a binary operation 7k- : D 2 — > D that is commutative and conservative. (We say 
that * is conservative if for all x,y G I? it holds that xky £ {x,y}.) Such 
operations were studied in the context of the CSP by Bulatov and Jeavons [11]. 




176 



Hubie Chen 



Corollary 10. Let * : D 2 — > D be a commutative conservative operation. The 
problem QCSP(*) is decidable in polynomial time. 

Proof. It is straightforward to verify that when * is a commutative conservative 
operation, the relation < on C* is a total ordering, and hence has a unique 
minimal component. □ 

The proof of Theorem 8 can be generalized to a multi-sorted version of the 
QCSP. We refer the reader to [9] for the definition and a study of the multi-sorted 
CSP; the definition of the multi-sorted QCSP is analogous to this definition. 

Theorem 11. Let * be a multi-sorted binary operation over a finite collection 
of domains V, such that the interpretation * D of * on any domain D £ V is a 
2-semilattice operation. If for every domain D £ V there is a unique minimal 
component in C* , then QCSP(*) is decidable in polynomial time. Otherwise, 
QCSP(*) is coNP-hard. 



3.2 Proof 

We prove Theorem 8 in two parts: the tractable cases are established in Theorem 
12, and the intractable cases are established in Theorem 13. 

Theorem 12. Let * : D 2 — > D be a 2-semilattice operation. If there is a unique 
minimal component in C* , then QCSP(*) is decidable in polynomial time. 

Proof. First, fix an element b in the minimal component of C* . For every element 
d £ D, there exists a path from b to d in Q*. (This is because for every component 
C of C* it holds that B < C, where B denotes the minimal component of C*.) 
Let k be a sufficiently large integer so that for every d £ D, there is a path from 
b to d of length less than or equal to k. 

We show that QCSP(*) is (2 fc — l)-collapsible; the result then follows from 
Theorem 7 along with the tractability of CSP(*) (see [3]). Let cf> be an instance 
of QCSP(*) where all (2 k — l)-collapsings of <f> are true. Then, we have that for 
each (2 k — l)-adversary set F for Y, the set F is winnable. For W C Y and 
d £ D, define the set P(w,d) to be the set of adversaries r : Y — > D such that 
r(y) = d for all y £ Y \ W. We prove that for every universal variable subset 
W C Y with | W | > 2 k — 1, the set -Fjvyb) is winnable. This suffices, as it implies 
that the set F( Y j,\, which is the set of all adversaries r : Y — » D, is winnable. 

The proof is by induction on \W\. When |IT| = 2 k — 1, the claim is immediate 
from the assumption that that each (2 fc — l)-adversary set (for Y) is winnable, 
along with the fact that any subset of a winnable adversary set is winnable. For 
the induction, assume that \W\ > 2 k — 1 and let Wq be a subset of W having size 
2 k . Let {u>i, . . . , w 2 k} denote the elements of Wo- Let Fj denote F^ w \f w .j b ^ (for 
all i £ [ 2 k ]). We may assume by induction that F t is winnable (for all i £ [2 fc ]). 

We claim that there are strategies 7T 1 = {nj : D l — » D} ie [ n ], . . . , tt 2 = {tt 2 : 
D l —> D} ie [ n ] such that for all r £ F, the following two properties hold: 
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(1) for all i € [2 fe ], the mapping t 1 : {yi, . . . , y n } — > D defined by T z (yj) = 
1 T j( T (yi)i ■■• ■t T (Uj )) (f° r all j e N) is contained in Fi, and 

(2) for all j G [n] , it holds that 

T(Vj) = ((r 1 ^) * r 2 (%)) * (t 3 (%) * r 4 ( % ))) * • • • 

where the right hand side of the above expression is a balanced binary tree 
of depth k with leaves T X (yj ), . . . , r 2 (yj). 

By appeal to Lemma 6, establishing this claim entails that F(yy,b) is winnable. 
We consider two cases: 

— For universal variables yj €Y \ Wo, we let tt® be the projection onto the last 
coordinate (for all i), so that r l (yj ) = r{yj) and property (2) holds by the 
idempotence of *. Note that since r{yj) = b when yj &Y \ W, we have that 
T l {yj) = b for such yj (for all i). 

— For universal variables yj G Wo, it suffices by symmetry to show that the 
polynomial 

(((b * x 2 ) * (%3 * X 4 )) *■ ■ ■) 

that is a balanced binary tree of depth k with leaves b,x 2 , • • • , x 2 k , is surjec- 
tive. Identifying the variables x 2 < +1 , . . . , x 2 *+i to be equivalent (for i ranging 
from 0 to k — 1), it suffices to show that the polynomial 

is surjective. This follows immediately from our choice of k : for any element 
d G D, there is a path from b to d in Q* of length k. (Note that every vertex 
in Q* has a self-loop, so as long as there is a path from b to d with length 
less than or equal to k, there is a path from b to d with length equal to k.) 

□ 

We complete the classification of 2-semilattices by proving a complement to 
Theorem 12, namely, that the remaining 2-semilattices give rise to QCSPs that 
are intractable. 

Theorem 13. Let * : D 2 — > D be a 2-semilattice operation. If there are two 
or more minimal components in C* , then QCSP(*) is coNP-hard (even when 
restricted to M3- formulas) . 

Proof. We show coNP-hardness by reducing from the propositional tautology 
problem. Let C(y \, . . . , y n ) be an instance of this problem, where C is a circuit 
with input gates having labels y±, ... ,y n . We assume without loss of generality 
that all non-input gates of C are either AND or NOT gates, and assign all non- 
input gates labels x\, . . . ,x m . The quantifier prefix of the resulting quantified 
formula is My\ . . . My n 3x 1 . . . 3x m . Let Bo aud B\ be distinct minimal components 
in C*, and let b 0 and b\ be elements of B 0 and B 1 , respectively. The constraint 
network of the resulting quantified formula is constructed as follows. 

For each AND gate x t with inputs v, v 1 G {y 1 , . . . , y n } U{xi, . . . , x m }, include 
the four constraints: 
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• (v G B i) A ( v ' € B\) =$■ ( Xi = b\) 

• (v G B 0 ) A ( v ' G Hi) (xi = b 0 ) 

• (v G Si) A (V € Bo) => = bo) 

• (v G B 0 ) A ( v ' G S 0 ) => (a;, = 6 0 ) 

For each NOT gate a:* with input v G { yi , . . . , y n } U {ari , . . . , x m }, include 
the two constraints: 

• (v€ Bo) => (xi = 61) 

• (v G Si) => (xi = b 0 ) 

For the output gate x 0 , include the constraint: 

• (x 0 G So) =t- FALSE 

It is fairly straightforward to verify that each of the given constraints has the 
* operation as polymorphism; the key fact is that (for i G {0,1}) multiplying 
any element of D by an element c outside of S, yields an element c' outside of 
Bi . (If not, there is an edge from c to c' by restricted associativity of *; this edge 
gives a contradiction to the minimality of Bi). 

We verify the reduction to be correct as follows. 

Suppose that the original circuit was a tautology. Let / : {yi , . . . , y n } — > 
D be any assignment to the V-variables of the quantified formula. Define f : 
{yi, ■ ■ ■ , y n } -*• (S 0 USi) by f'{y%) = f(yi) if f{y z ) G S 0 USi, and as an arbitrary 
element of BoLiBi otherwise. The AND and NOT gate constraints force each Xi 
to have either the value bo or b\ under f r ; it can be verified that the assignment 
taking Xi to its forced value and y t to f{yi) satisfies all of the constraints. 

Suppose that the original circuit was not a tautology. Let g : {yi , . . . , y n } — » 
{0, 1} be an assignment making the circuit C false. Let g' : {y 1; . . . ,y n } — » D 
be an assignment to the V-variables of the quantified formula such that g'(iji) G 
B g ( Vi ) (for all i G [n]). Under the assignment g', the only assignment to the 3- 
variables Xi satisfying the AND and NOT gate constraints is the mapping taking 
Xi to bo if the gate with label Xi has value 0 under g , and b\ if the gate with 
label Xi has value 1 under g. Hence, if all of these constraints are satisfied, then 
the output gate constraint must be falsified. We conclude that no assignment to 
the 3- variables Xi satisfies all of the constraints under the assignment g', and so 
the quantified formula is false. □ 

4 Tractable 2-Semilattices Are 1-Collapsible 

In the previous section, it was shown that for certain 2-semilattice operations *, 
the problem QCSP(*) is polynomial-time tractable. For these operations, it was 
proved that QCSP(*) is j-collapsible for some constant j. However, the given 
proof demonstrated j-collapsibility for constants j that could be arbitrarily large, 
depending on the operation *. In this section, we refine this result by proving 
the strongest possible statement concerning the collapsibility of QCSP(*): the 
problem QCSP(*) is 1-collapsible whenever QCSP(*) is tractable. 
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Theorem 14. Let * : D 2 — » D be a 2-semilattice operation. If there is a unique 
minimal component in C* , then QCSP(x) is 1-collapsible. 

In the proof of this theorem, we write adversaries (and sets of adversaries) 
using tuple notation. We also use the notation F < F\ * F 2 to denote that F is 
★-composable in one step from F\ and F%. 

Proof. Fix do to be any element of the unique minimal component of C*. For 
every element d £ D, there exists a path from do to d in Q*. Hence, it is possible 
to select sufficiently large integers K and L and elements {d\ }ie[L],je[K] so that: 

— each of the sets 

P 1 = {d 1 0 ,d 1 1 ,... 1 d\} 

P K = {d«,d?,..,,d«} 

is a path in the sense that, for all j £ [K], all of the pairs 
K, d{), (d{,d J 2 ), ..., (d 3 L _ v d J L ) 



are edges in Q*; and, 

— all elements of D lie on one of these paths, that is, D = U f =1 P 3 ■ 
Here, we use d\, . . . , d* as alternative notation for do, that is, 

do = d 0 = ■■■ = d 0 . 

For t = 0,...,Lwe define 

A d = 



and we define 

Ei = D 0 U • • • U Di. 

Notice that El = D. 

To prove 1-collapsibility, we fix a problem instance, and assume that each 
1-adversary set is winnable. In particular, we assume that each of the sets 



D x {d 0 } x ••• x {d 0 } 



{d 0 } x ••• x {d 0 } x D 

is winnable. Our goal is to prove that D x • • • x D is winnable. 

We prove by induction that for all i = 0 , ... ,L the set E, : x ■ ■ ■ x E, is 
winnable. The base case * = 0 holds by assumption. For the induction, assume 
that Ei x • • • x Ei is winnable (where i > 0); we show that £)+i x • • • x A+i is 
winnable. 
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We first show that E l+ i x x • • ■ x E, is winnable. We have 



(E i+ 1 x E 0 x • • • x E 0 ) < (Ei x Ei x • • • x Ei) * (D x {d 0 } x • • • x {d 0 }). 



Then, we have 



(E i+ 1 x Bi x • • • x Ei) < (Ei x Ei x • • • x Ei) * (B i+ i x E 0 x • • • x E 0 ) 

and 

(-E'j+i x E2 x • • • x E2) < {Ei x Ei x ■■■ x Ei) * x Bi x • • • x 2 ?i). 

Continuing in this manner, we can show that Ei + 1 x EiX ■ ■ ■ x Ei is winnable. 
By symmetric arguments, we can show the winnability of the sets 

{Ei x E i+ 1 x EiX-- - x Ei) 



{Ei x Ei x ■ ■ ■ x Ei x E i+ 1). 

We have the winnability of x E l+ i x Ei x ■ ■ ■ Ei by 
{E i+1 x E i+ i x EiX- ■ ■ Ei)<j{E i+ 1 xEiXEiX---x x E i+l xEiX---xE{) 

and we have the winnability of Ef +1 x Ei x - - - x Ei by 

{Ef +1 xEiX---xEi )< 1 (B ? +1 xEixEiX---xEi)-k {E? x E i+1 x B; x ■ ■ ■ x Ei). 

Proceeding in this fashion, we have the winnability of E i+ i x ••• x E l+ \ , as 
desired. □ 
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Abstract. The constraint satisfaction problem (CSP) can be formu- 
lated as the problem of deciding, given a pair ( A, B) of relational struc- 
tures, whether or not there is a homomorphism from A to B. Although 
the CSP is in general intractable, it may be restricted by requiring the 
“target structure” B to be fixed; denote this restriction by CSP(B). In 
recent years, much effort has been directed towards classifying the com- 
plexity of all problems CSP(B). The acquisition of CSP(B) tractability 
results has generally proceeded by isolating a class of relational structures 
B believed to be tractable, and then demonstrating a polynomial-time 
algorithm for the class. In this paper, we introduce a new approach to 
obtaining CSP(B) tractability results: instead of starting with a class of 
structures, we start with an algorithm called look-ahead arc consistency, 
and give an algebraic characterization of the structures solvable by our 
algorithm. This characterization is used both to identify new tractable 
structures and to give new proofs of known tractable structures. 



1 Introduction 

1.1 Background 

Constraint satisfaction problems arise in a wide variety of domains, such as com- 
binatorics, logic, algebra, and artificial intelligence. An instance of the constraint 
satisfaction problem (CSP) consists of a set of variables and a set of constraints 
on those variables; the goal is to decide whether or not there is an assignment to 
the variables satisfying all of the constraints. It is often convenient to cast the 
CSP as a relational homomorphism problem, namely, the problem of deciding, 
given a pair (A, B) of relational structures, whether or not there is a homomor- 
phism from A to B. In this formalization, each relation of A contains tuples 
of variables that are constrained together, and the corresponding relation of B 
contains the allowable tuples of values that the variable tuples may take. 
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The CSP is NP-complete in general, motivating the search for polynomial- 
time tractable cases of the CSP. A particularly useful way to restrict the CSP 
in order to obtain tractable cases is to restrict the types of constraints that 
may be expressed, by requiring the “target structure” B to be fixed; denote 
this restriction by CSP(B). This form of restriction can capture and place into 
a unified framework many particular cases of the CSP that have been indepen- 
dently investigated - for instance, the Horn Satisfiability, 2-Satisfiability, 
and Graph H-Colorability problems. Schaefer was the first to consider the 
class of problems CSP(B); he proved a now famous dichotomy theorem, showing 
that for all relational structures B over a two-element universe, CSP(B) is ei- 
ther tractable in polynomial time, or is NP-complete [17]. In recent years, much 
effort has been directed towards the program of classifying the complexity of 
CSP(B) for all relational structures B over a finite universe; impressive progress 
has been made along these lines, including the papers [11,14,12,13,9,16,15,5, 
2-4,6, 1]. This research program has developed a rich set of tools for studying 
CSP complexity, which draws on and establishes connections to a diversity of 
areas, including artificial intelligence, universal algebra, database theory, logic, 
and group theory. The connection between CSP complexity and universal alge- 
bra [14, 12, 5] has been particularly fruitful and has been used heavily to obtain 
many of the recent results on CSP(B) complexity. A notion used to establish 
this connection is that of invariance of a relational structure under an opera- 
tion; roughly speaking, a relational structure is invariant under an operation if 
the relations of the structure satisfy a certain closure property defined in terms 
of the operation. 

A central component of the CSP(B) classification program is to identify those 
structures B such that CSP(B) is tractable. Indeed, Bulatov and Jeavons [7] have 
identified a plausible conjecture as to which structures B are tractable; in par- 
ticular, they conjecture that a known necessary condition for tractability is also 
sufficient. This conjecture has been verified for some large classes of structures 
B; see [2] and [4]. 

The acquisition of CSP(B) tractability results has, almost exclusively, pro- 
ceeded by isolating an easily clescribable condition believed to imply tractability, 
and then demonstrating an algorithm that decides CSP(B) for all B satisfying 
the condition [11, 14, 13, 9, 5, 8, 2-4, 6, l] 1 . As an example, in [8], the class of 
relational structures invariant under a commutative conservative operation is 
demonstrated to be tractable by an algorithm known as 3-minimality. A sim- 
ple algorithm can be given for deciding whether or not a relational structure 
is invariant under a commutative conservative operation; on the other hand, 
the meta-question of deciding, given a relational structure B, whether or not 3- 
minimality is a solution procedure for CSP(B), is not known to be decidable. Put 
succinctly, this result (and many others) demonstrate that a well-characterized 
class of relational structures is tractable, via an algorithm that is not well- 
characterized. 

1 A notable exception is the paper [9], which gives an algebraic characterization of 
those relational structures for which establishing arc consistency is a solution proce- 
dure. 
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1.2 Contributions of This Paper 

In this paper, we introduce a radically different approach to the acquisition 
of CSP(B) tractability results. Instead of taking as our starting point a well- 
characterized class of relational structures, we begin with an algorithm for con- 
straint satisfaction, and prove that this algorithm is well-characterized. In par- 
ticular, we introduce a polynomial-time algorithm which we call look-ahead arc 
consistency (LAAC), and show that those structures B for which LAAC decides 
CSP(B) are exactly those structures satisfying a simple algebraic criterion; this 
algebraic characterization can be readily translated to a decision procedure for 
the meta-question of deciding whether or not LAAC is a solution procedure for a 
given relational structure 2 . We then use this algebraic characterization to give a 
new class of tractable relational structures that is described using the algebraic 
notion of invariance under an operation. We hope that our work will inspire 
further research devoted to giving algebraic characterizations of algorithms sim- 
ilar to our characterization, and that such research will stimulate an interplay 
between our new approach of studying well-characterized algorithms, and the 
classical approach of studying well-characterized problem restrictions. 

In addition to containing relational structures that have not been previously 
observed to be tractable, the new class of relational structures that we identify 
also contains structures that can be shown to be tractable by known results. The 
fact that these latter structures are contained in our new class does not yield any 
new information from the standpoint of classifying CSP(B) problems as either 
tractable or NP-complete; nonetheless, we believe this fact to be interesting for 
two reasons. First, it implies that these relational structures can be solved using 
a new algorithm (namely, LAAC) which we believe to be conceptually simpler 
than previously given algorithms for the structures; second, it gives an alternative 
proof of the tractability of such structures. We mention that the LAAC algorithm 
is a solution method for the well-known 2-Satisfiability problem; this was, in 
fact, observed first in [10] for a specialization of the LAAC algorithm to inputs 
that are boolean formulas in conjunctive normal form. 

We study both LAAC and an extension of LAAC which we call smart, look- 
ahead arc consistency (SLAAC); like LAAC, the SLAAC algorithm runs in poly- 
nomial time. We show that there is an algebraic characterization, similar to that 
for LAAC, of the relational structures for which SLAAC is a solution proce- 
dure. We also demonstrate that SLAAC is a solution procedure for a fragment 
of the class of relational structures invariant under a commutative conservative 
operation. As mentioned, this class of structures has previously been shown to 
be tractable [8]; however, we believe our new demonstration of tractability to 
be interesting for the same two reasons (given above) that our rederivation of 
known tractability results via LAAC is interesting. 

We wish to emphasize that, from our viewpoint, the novelty of this paper lies 
in our ability to provide, for each of the two algorithms, an exact characterization 

2 Amusingly and intriguingly, LAAC is itself a decision procedure for this meta- 
question: the algebraic criterion concerns the existence of a relational homomorphism 
to B, and can be cast as an instance of CSP(B). 
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of the problems CSP(B) for which the algorithm is a solution procedure-and not 
in the sophistication of the algorithms, which are actually quite simple. Such 
exact characterizations of algorithms are not in general known, and we regard 
the ideas in this paper as a starting point for developing exact characterizations 
for algorithms of higher sophistication. 

In what remains of this section, we give a brief overview of the LAAC and 
SLAAC algorithms. Both of these algorithms make use of arc consistency , a lo- 
calized inference mechanism studied heavily in constraint satisfaction. Arc con- 
sistency generalizes unit propagation, the process of removing unit clauses from 
boolean formulas. Any instance of the CSP can be efficiently “tightened” into 
an equivalent instance (that is, an instance having the same satisfying homo- 
morphisms) that is arc consistent. This tightening provides a sort of one-sided 
satisfiability check: if the second instance has an empty constraint, then the 
original instance is unsatisfiable; however, the converse does not hold in general. 
When the second instance does not have any empty constraints, we say that 
arc consistency can be established on the original instance. The look-ahead arc 
consistency algorithm is the following: 

— Arbitrarily pick a variable a, and set E to be the empty set. 

— For all values b of the target structure B, substitute b for a, and attempt to 
establish arc consistency; if arc consistency can be established, place b into 
the set E. 

— If A is empty, then output “unsatisfiable”; otherwise, arbitrarily pick a value 
b e E and set the variable a to b. 

— Repeat. 

It can be seen that the algorithm proceeds by picking a variable a, and then 
constructing a “filtered” set E of possible values for a; it then commits a to any 
one of the values inside the filtered set (presuming that it is non-empty), and 
continues. Notice that it never makes sense to set the variable a to a value outside 
of its “filtered” set E, since if arc consistency cannot be established after value b 
is substituted for a, then there is no satisfying assignment mapping a to b. This 
algorithm is quite simple: indeed, up to two arbitrary choices (choice of variable 
and choice of value) can be made in each iteration, and the only conceptual 
primitive used other than picking and setting variables is arc consistency. In 
the “smart” version of look-ahead arc consistency, one of these arbitrary choices 
is eliminated: the value for a variable a is picked from the set E by applying 
a set function to E. Despite the simplicity of LAAC and SLAAC, the class of 
relational structures that are tractable via these algorithms is surprisingly rich. 
Note that, due to space limitations, some proofs are omitted. 

2 Preliminaries 

Our definitions and notation are fairly standard, and similar to those used in 
other papers on constraint satisfaction. A relation over a set A is a subset of 
A k (for some k > 1), and is said to have arity k. When I? is a relation of arity 
k and i\, ...,ii € {l,...,fc}, we use pr^ to denote the arity l relation 

{ 7 • ' ' 5 ) ■ (Al 5 * * * 7 ^fc) € A} • 
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Relational structures. A vocabulary a is a finite set of relation symbols, each 
of which has an associated arity. A relational structure A (over vocabulary a) 
consists of a universe A and a relation R A over A for each relation symbol R of 
cr, such that the arity of R A matches the arity associated to R by a. In this pa- 
per, we will only be concerned with relational structures having finite universe. 
Throughout, we will use bold capital letters A, B, . . . to denote relational struc- 
tures, and the corresponding non-bold capital letters A,B,... to denote their 
universes. 

When A and B are relational structures over the same vocabulary cr, we 
define A x B to be the relational structure with universe A x B such that for 
each relation symbol R of cr, it holds that R AxB = {((ai, &i), . . . , (a*,, &&)) : 
(ai, . . . , (ik) € R a , bk) € f? B }, where k denotes the arity of R. When A 

and B are relational structures over the same vocabulary cr, we define A U B 
to be the relational structure with universe AU B such that for each relation 
symbol R of cr, R AUB = R A U R B . 

When A and B are relational structures over the same vocabulary cr, a 
homomorphism from A to B is a mapping h : A — > B from the universe A of 
A to the universe B of B such that for every relation symbol R oi a and every 
tuple (ai, . . . , afc) € R a , it holds that (/i(ai), . . . , h(ak)) G R B ■ 

Constraint satisfaction. The constraint satisfaction problem, denoted by 
CSP, is defined to be the problem of deciding, given as input a pair of rela- 
tional structures (A, B) over the same vocabulary, whether or not there exists 
a homomorphism from the source structure A to the target structure B. When 
(A, B) is a CSP instance, we will say that (A, B) is satisfiable if there is a homo- 
morphism from A to B; and, we will at times refer to elements of the universe 
of the source structure A as variables. The CSP(B) problem is the restriction of 
the CSP problem where the target structure must be B. 

When (A, B) is a CSP instance over the vocabulary a, a £ A, and b £ B, we 
let (A, B)[a = b] denote the CSP instance obtained from (A, B) in the following 
way: for each relation symbol R of cr, remove each tuple (ai,... ,<Zfc) £ R A 
such that a £ {ai, . . . , ctfc} and add the tuple (a tl a,,), where i\ < ■ ■ ■ < ii 
and = {i : ai ^ a}, to S A , where S a relation symbol such that 

S B = pr^ ... j {(61, . . . ,bk) £ R B '■ Vz £ {i : ai = a}, 6* = b}\ if necessary, extend 
the vocabulary a and the relational structure B so that there is such a relation 
symbol S. Intuitively, (A, B)[a = b] is the CSP instance obtained by setting the 
variable a to the value 6; a mapping h is a homomorphism for (A, B) sending 
a to b if and only if the restriction of h to A \ {a} is a homomorphism for 
(A, B)[o = b\. We say that a relational structure B over vocabulary a permits 
constant instantiation for C if C C B and the following two properties hold: ( 1 ) 
for every b £ C, there exists a relation symbol Rb of a such that R B = {(&)}; 
and, ( 2 ) for every relation symbol R of cr, index subset {A , . . . ,ii} Q { 1 , . . . , k} 
(where k denotes the arity of R and zi < ■ ■ ■ < if), and b £ C, there exists a 
relation symbol S such that S B = pr^ ... i {(&i, . . . , bk) ■ Vz ^ {*i, . . . , *;}, 6, = b}. 
The key feature of this notion is that if (A.B) is an instance of CSP(B) and 
B permits constant instantiation for C, then for any a £ A and b £ C, the 
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instance (A,B)[a = 6] is an instance of CSP(B). If the relational structure 
B permits constant instantiation for its entire universe, we simply say that B 
permits constant instantiation. 

An equivalent way of formulating the constraint satisfaction problem is as 
follows. Define a constraint to be an expression of the form R(a \, . . . , afi) where 
R is a relation of arity k over a finite set B , and each a, is a variable taken 
from a finite variable set A. Let us say that an assignment h : A — > B satisfies 
the constraint R(a\, . . . ,ak) if . . . , h(ak)) G R. With these definitions 

in hand, we can provide a constraint-based formulation of the CSP: given a 
finite set of constraints - all of which have relations over the same set B , and 
variables from the variable set A decide whether or not there is an assignment 
h : A — > B satisfying all of the constraints. When A and B are relational 
structures over the same vocabulary a, the instance (A.B) of the CSP problem 
as formulated above can be translated into this formulation by creating, for each 
relation Roi a and for each tuple (ai, . . . , afi) of i? A , a constraint R B (ai , . . . , afi). 
It is straightforward to verify that an assignment h : A — > B satisfies all such 
constraints if and only if it is a homomorphism from A to B. 

Arc consistency. Suppose that C is a set of constraints, that is, an instance 
of the constraint-based CSP. We say that C is arc consistent if for any two 
constraints R(a \, . . . , a*,), R' (a[, . . . , a' k ,) in C, if ai = a', then pr.^i? = p r-i?C 
Any CSP instance C can be transformed, in polynomial time, into an arc consis- 
tent CSP instance that is equivalent in that it has exactly the same satisfying 
assignments. This is done by continually looking for constraints R(a ±, . . . , a*,), 
R! ( a \ . . . . , a' k ,) and i, j such that ai = a' and pi \R pr^i?', and replacing R by 

{(ai, . . . , Ofc) G R : ai G pr^-R'}, and R' by {(a' 1; . . . , a' k ,) G R' : a :j G pr^R}. 

The procedure of transforming a CSP instance C into an equivalent arc con- 
sistent instance C' gives a sort of one-sided satisfiability check. Specifically, if the 
second instance C' has an empty relation, we can immediately conclude that the 
original instance C is unsatisfiable. (This is because no assignment can satisfy 
a constraint having empty relation.) Note, however, that the converse does not 
hold: if an instance C is unsatisfiable, it does not (in general) follow that an 
equivalent arc-consistent instance C contains an empty relation. 

When B is a relational structure over vocabulary cr, define 'P(B) to be the 
relational structure over cr having universe p(B) \ {0} and such that for every 
relation symbol R of cr, R v{ - B '> = {(pr 1 i? / , . . . , pr fc i?') : R' C R B ,R' 0}, where k 
denotes the arity of R. (By p(B ), we denote the power set of B.) Let (A, B) be an 
instance of the CSP, and let C be the corresponding equivalent instance in the 
constraint-based CSP formulation. When the above arc consistency procedure 
can be applied to the instance C to obtain an equivalent arc consistent instance 
C that does not have an empty relation, there exists a homomorphism from A 
to 'P(B). (We can define a homomorphism h : A — > V(B) as follows. For any 
element a of A, let R'(a\, . . . ,ak>) be any constraint in C such that a' = a, 
and define h(a) = pr^RC By the definition of arc consistency, the definition 
of h on a is independent of the choice of constraint. It is straightforward to 
verify that h is a homomorphism from A to 'P(B).) Moreover, when there is a 
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homomorphism from A to 'P(B), the above arc consistency procedure terminates 
without introducing an empty relation. In light of these facts, when (A, B) is a 
CSP instance, we say that arc consistency can be established on (A.B) if there 
exists a homomorphism from A to 'P(B). 

Invariance. A powerful algebraic theory for studying the complexity of CSP(B) 
problems was introduced in [14, 12]. We briefly recall the key definitions of this 
theory that will be used in this paper, referring the reader to [14, 12] for a 
detailed treatment. To every relational structure B, we can associate a class of 
functions, called the polymorphisms of B, and denoted by Pol(B). A function 
/ : B k — > B is a polymorphism of B if / is a homomorphism from B fe to B; when 
this holds, we also say that B is invariant under /. An important fact concerning 
the polymorphisms of a relational structure is that the complexity of CSP(B) 
depends only on Pol(B). Precisely, if Bi and B 2 are relational structures with 
the same universe such that Pol(Bi) = Pol(B 2 ), then CSP(Bi) and CSP(B 2 ) 
are reducible to each other via polynomial-time many-one reductions [12]. This 
tight connection between the polymorphisms of B and the complexity of CSP(B) 
permits the use of sophisticated algebraic tools in the quest to understand the 
complexity of CSP(B), for all relational structures B [14,12,5]. 

3 Algorithms 

In this section, we present the look-ahead arc consistency algorithm and the 
smart look-ahead arc consistency algorithm; and, we give purely algebraic char- 
acterizations of the class of relational structures solvable by each of the algo- 
rithms. 

We begin by giving a formal description of look-ahead arc consistency. 

Algorithm 1 Look-Ahead Arc Consistency (LA AC). 

Input: a CSP instance (A,B). 

— Initialize h to be an empty mapping, that is, a mapping with empty set as 
domain. 

— While the universe of A is non-empty: 

• Arbitrarily pick a variable a from the universe of A. 

• Compute the set E of values b (from the universe of B) such that arc 
consistency can be established on (A,B)[a = b\. 

• If the set E is empty: 

* Output “unsatisfiable” and terminate. 

• Else: 

* Arbitrarily pick a value b from E, and extend h to map a to b. 

* Replace (A,B) with (A,B)[a = b\. 

— Output the mapping h and terminate. 
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When B is a relational structure, we will say that LAAC is a solution pro- 
cedure for CSP(B) if for every instance (A,B) of CSP(B), when there is a ho- 
momorphism from A to B, the LAAC algorithm outputs such a homomorphism. 
Notice that if there is no homomorphism from A to B, the LAAC algorithm will 
always output “unsatisfiable” on the CSP instance (A,B). 

We now give an algebraic characterization of those relational structures B 
for which LAAC solves CSP(B). Our algebraic characterization has a particularly 
simple form, namely, it concerns the existence of a homomorphism from £(B), 
a relational structure derivable from B, to B itself. 

The relational structure £(B) is defined as follows. For a relational structure 
B with universe 13, let L(B) to be the set 

{{S G P(B) : |S| >1}xB)U {({ S },*) :s£B} 

where * is assumed to not be an element of B. 

Define cb : (p(B) \ {0}) x B — » L(B) to be the (surjective) mapping such 
that cb (S,b) = (S,*) if IS) = 1, and cb (S,b) = ( S,b ) if |S| > 1. Intuitively, 
the mapping cb “collapses” together elements of V(B) x B that share the same 
singleton set in their first coordinate. Define £(B) to be the relational structure, 
over the same vocabulary a as B, having universe L(B ) and such that for all re- 
lation symbols R of a, = {(cb(£i), ■ . . , cb (£*;)) : (ti, ... . , ffc) G i? p ( B ) xB }, 

where k denotes the arity of R. Clearly, cb is a homomorphism from 'P(B) x B 
to £(B); one can think of £(B) as a “collapsed” version of V(B) x B. 

Theorem 1. Let B be a relational structure (with universe B). The LAAC al- 
gorithm is a solution procedure for CSP(B) if and only if there exists a homo- 
morphism l : £(B) -> B such that ?({&},*) = b for all b £ B. 

Proof. Suppose that there exists a homomorphism l of the described form. We 
assume that B permits constant instantiation; if it does not, it may be expanded 
to a relational structure which does and still possesses the homomorphism l. 
It suffices to show that each time a variable is set by LAAC, satisfiability is 
preserved. Precisely, we show that when A is a relational structure such that 
(A, B) is satisfiable (via the homomorphism h : A — > B), then for any a £ A, 
b £ B, if arc consistency can be established on (A, B)[a = b ], then (A, B)[a = b] 
is satisfiable. When arc consistency can be established on (A, B)[a = b], it follows 
(by the definitions of (A, B)[a = b] and 'P(B)) that there is a homomorphism 
p : A — » V(B) such that p(a) = {&}. Composing the homomorplrisms (p,h) : 
A -> V(B) x B, c B : V(B) x B — > £(B), and l : £(B) -> B, we obtain a 
homomorphism from A to B sending a to 6, as desired. 

Suppose that the LAAC algorithm is a solution procedure for CSP(B). It 
suffices to show that there is a homomorphism /i. : P(B) x B - > B such that 
h({bi}, 62 ) = b\ for all 61, 62 G B, as such a homomorphism can be viewed as the 
composition of cb and l, for some homomorphism l of the described form. We 
assume for ease of presentation that B permits constant instantiation; the same 
ideas apply when B does not permit constant instantiation. For every b £ B, 
let Rb be a relation symbol such that f? B = {(b)}. Let Cb lt b 2 be the relational 
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structure over the same vocabulary as B with universe (p(B) \ { 0 }) x B such 
that R^ bl,b2 = {({fei}, 62)}, and R Cb i- b 2 = 0 for all other relation symbols R. Let 
C denote the set {Cb lt b 2 ■ 61, 6 2 £ B}. We prove that for every subset C C C, 
there is a homomorphism from (fP(B) x B) U (UceC' C) to B; this suffices, as 
such a homomorphism is precisely a homomorphism h of the form desired. (The 
presence of a structure Cb lt b 2 as part of the source structure ensures that any 
homomorphism maps ({61 } , 62) to 61.) 

The proof is by induction on \C'\. Suppose C = 0 . The mapping from (p(B)\ 
{ 0 }) x B to B that projects onto the second coordinate is a homomorphism from 
V{B) x B to B. Now suppose \C'\ > 1 . Let C" C C and 61,62 € B be such that 
C' = C" U {Cb lt b 2 } and | C"\ + 1 = \C'\. The mapping from (p(B) \ { 0 }) x B 
to p(B) \ { 0 } that projects onto the first coordinate is a homomorphism from 
{V(B) x B) U (UceC ,/ ) to P ( B ) , so arc consistency can be established on 
((■p(B) x B) U (UceC" C), -B) [({6i}, 6 2 ), 61]. Moreover, the instance (('P(B) x 
B)U(UceC" C), B) is satisfiable by induction, so by the assumption that LAAC is 
an algorithm for CSP(B), the instance ((P(B) x B)U(U C eC' C), B) is satisfiable. 

□ 

Corollary 2 . The class of relational structures B. having the property that the 
LAAC algorithm is a solution procedure for CSP(B), is decidable. 

We now turn to the smart look-ahead arc consistency (SLAAC) algorithm. 
The difference between the LAAC and SLAAC algorithms is that the SLAAC 
algorithm takes as a parameter a set function / : p(B) \ { 0 } — ► B, and when 
setting a variable, instead of arbitrarily picking from the set E of candidate 
values, directly sets the variable to f(E). Note that for any relational structure 
B, if LAAC is a solution procedure for B, then SLAAC is also a solution procedure 
for B, when parameterized with any set function that is conservative in the sense 
that f(S) £ S, for all non-empty S C B. 

The algebraic characterization for the SLAAC algorithm, as with that for 
the LAAC algorithm, concerns the existence of certain liomomorphisms. Before 
giving this characterization, it is necessary to introduce the following auxiliary 
notion. For a relational structure B, we define a pair ( 17 , 6 ), where U C B and 
6 £ U, to be usable if there exists a satisfiable CSP instance (A, B) and there 
exists a variable a £ A such that 

U = {u £ B : arc consistency can be established on (A, B)[a = «]} 

and there is a homomorphism from A to B sending a to 6. 

Theorem 3 . Let f : p(B)\{ 0 } — > B be a set function on the set B, and let B be 
a relational structure with universe B permitting constant instantiation for the 
image of f. The SLAAC algorithm, parameterized with f, is a solution procedure 
for CSP(B) if and only if for every usable pair ( U,b ) (of B), there exists a 
homomorphism hu,b '■ V(B) k xB-)B such that hjj,b({bi}, • • ■ , {6fc},6) = f{U), 
where 61, . . . , bk denote the elements of U . 
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Proof. Suppose that there exist homomorphisms hu,b of the described form. As 
in the proof of Theorem 1, it suffices to show that each time a variable is set by 
SLAAC, satisfiability is preserved. Precisely, we show that when A is a relational 
structure such that a £ A, (A, B) is satisfiable via a homomorphism h : A — > B 
mapping a to b, and U C B contains exactly those values c £ B such that 
arc consistency can be established on (A,B)[a = c], then (A,B)[a = f{U)] is 
satisfiable. Note that given these assumptions, the pair (U, b) must be usable. 
Let fefc be the elements of U; since (for each i) arc consistency can be 

established on (A, B)[a = b t ], there is a homomorphism p* : A — > 'P(B) such that 
p(a) = {&;}. Composing the homomorphisms (pi, . . . ,pk, h) : A — > P(B) k x B 
and hu.b ■ V(B) k x B — > B, we obtain a homomorphism from A to B sending a 
to f{U), as desired. 

Suppose that the SLAAC algorithm is a solution procedure for CSP(B). Let 
( U , b) be a usable pair, and let b\, . . . , bk denote the elements of U. We show that 
there exists a homomorphism h : ’P(B) x B — > B such that /i({&i}, . . . , {bk}, b) = 
/([/). Let A denote the structure V{B) k x B, and let a denote the element 
({&i}, . . . , {frfc}, b) of A. By the homomorphism from A to B that projects onto 
the ith coordinate, there is a homomorphism from A to "P(B) such that a is 
mapped to bi, implying that arc consistency can be established on (A, B)[a = bf 
for all bi £ U . Also, by the homomorphism from A to B that projects onto the 
last coordinate, there is a homomorphism from A to B such that a is mapped 
to b. 

We extend A so that the values in U are exactly those values c such that 
arc consistency can be established on (A,B)[a = c], in the following way. Let 
(A 0 ,B) be a satisfiable CSP instance with a variable ao £ A o such that U = 
{c £ B : arc consistency can be established on (A,B)[ao = c]}, and there is 
a homomorphism from A to B sending ao to b. (Such a CSP instance exists 
by the definition of usable pair.) Identify the variable ao with a, assume that 
Ao \ {no} and A \ {a} are disjoint, and consider the CSP instance (A U A 0 , B). 
The values c in U are exactly those values such that arc consistency can be 
established on (A U A 0 ,B)[a = c]. Moreover, (A U A 0 ,B) is satisfiable via a 
homomorphism sending a to b. By the hypothesis that SLAAC solves CSP(B), 
the instance (AUAo, B) is satisfiable via a homomorphism h! sending a to f(U). 
The restriction of b! to A gives the desired homomorphism. □ 

As for LA AC, we can show that the class of structures that are solvable by 
SLAAC is decidable. This follows from the algebraic characterization given in the 
previous theorem, along with the following theorem. 

Theorem 4. Let B be a relational structure, and let ( U , b) be a pair where 
U C B and b £ U. The problem of determining whether or not (U, b) is usable, 
is decidable. 

Proof. Let ( U , b) be a pair where U C B and b £ U, and let b\,...,bk denote 
the elements of U. We show that ( U , b) is usable if and only if for all homo- 
morphisms h : V(B) k x B — » "P(B) such that \h({bi }, . . . , {bk}, 6)| = 1, it holds 
that h({bi }, . . . , {bk}, b) £ U. This suffices to establish the theorem: there are 
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finitely many homomorphisms from 'P(B) k x 

B to 'P(B), so the condition can 

be effectively checked. 

Suppose that the given homomorphism condition holds. Set A = 'P(B) k xB 
and a = ({foi }, . . . , {bk}, b). We claim that the instance (A, B) and the variable 
a are witness to the usability of (U,b). Observe that arc consistency can be es- 
tablished on (A,B)[a = u] if and only if there is a homomorphism from A to 
"P(B) sending a to {it}. Hence, by assumption, we cannot establish arc consis- 
tency on (A,B)[a = u] when u ^ B. But, we can establish arc consistency on 
(A, B)[a = bi] for any bi € U, since the projection onto the ith coordinate is a 
homomorphism from V(B) k x B to 'P(B) sending a to {bi}. 

Now suppose that the given homomorphism condition fails. We prove that 
( U , b) is not usable, by contradiction. Suppose that ( U , b) is usable. Let A and 
a € A be witness to the usability of ( U,b ). It follows that there is a homomor- 
phism h : A — > V(B) k x B sending a to ({foi }, . . . , {bk}, b). Since the homomor- 
phism condition fails, there is a homomorphism h' : 'P(B) fc x B — > P(B) such 
that h'({bi}, . . . ,{bk},b) = c, where c ^ U. Composing h and h! , we obtain a 
homomorphism from A to V( B) sending a to c; this implies that arc consistency 
can be established on (A, B)[a = c], contradicting that A and a are witness to 
the usability of ( U, b). □ 

Corollary 5. The class of relational structures B, having the property that the 
SLAAC algorithm is a solution procedure for CSP(B), is decidable. 

We can employ Theorem 3 to demonstrate that the SLAAC algorithm solves 
CSP(B) when the relational structure B is invariant under a set function. Set 
functions were studied in the context of CSP complexity in [9]; we say that a 
relational structure B is invariant under a set function / : (p(B) \ {0}) — > B if 
/ is a homomorphism from 'P(B) to B; or, equivalently, if it is invariant under 
all of the functions fk ■ B k B defined by fk(bi, . . . ,bk) = /({&i, . . . , bk}), for 
k > 1. 

Theorem 6. Suppose that the relational structure B (with universe B) is in- 
variant under a set function f : (p(B) \ {0}) — > B. Then, the SLAAC algorithm 
is a solution procedure for CSP(B). 

Proof. We first demonstrate that B is invariant under a set function h such that 
the map b —> h({b}) acts as the identity on all elements in im(ft,). (We use im 
to denote the image of a function.) Let c > 1 be sufficiently large so that for 
all n > c, im (f{) = im(/”). Define g to be the set function on B such that 
g(S) = /i(/(5'))- We have im(gi) = im(g) = im (f{). Let d > 1 be sufficiently 
high so that gf acts as the identity on all elements in its image, and define h 
to be the set function on B such that h(S) = g(^ 1 (g(S)). The map b — » h({b }) 
acts as the identity on all elements in im (gi), and hence on all elements in im(/i) 
(which is a subset of im(gi)). Moreover, it is straightforward to verify that B is 
invariant under all of the functions discussed, in particular, h. 

We can assume that the structure B permits constant instantiation on all ele- 
ments of im(/i); if it does not, we can enlargen B so that it does, while preserving 
the invariance of B under h. 
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We claim that the SLAAC algorithm, parameterized with A, solves CSP(B). 
This follows from Theorem 3: for each usable pair (U,b) with \U\ = k, we 
have a homomorphism hjj.b ■ V(B) k x B — > B of the desired form, given by 

hu,b{Si,---,Sk,b) = h(UCiSi). □ 

4 Tractability 

In this section, we present our primary tractability result, which shows that 
those relational structures invariant under a certain type of ternary function 
are tractable via the LAAC algorithm. We also demonstrate that the relational 
structures for which LAAC is a solution procedure can be combined in certain 
ways. 

Theorem 7. Suppose that t : B 3 — > B is a ternary function satisfying the 
three identities t(x,x,y ) = x, t(x,y,z) = t(y,x,z), and t(t(x,y,w), z,w) = 
t(x,t(y, z,w),w). For any relational structure B (with universe B) invariant 
under t, the LAAC algorithm is a solution procedure for CSP(B). 

Proof. We have that, for each fixed b £ B, the binary operation g b : B 2 — > B 
defined by g b {x,y ) = g(x,y,b) is a semilattice operation. Define h : (p(B) \ 
{0}) — > B to be the mapping defined by 

. . . ,Sfc},6) = gb(si,g b (s 2 , ■ ■ -,gb{sk-i,Sk) ■ • •))■ 

Note that the right-hand side is well-defined, since g b is a semilattice operation. 
It can be verified that A is a homomorphism from 'P(B) x B to B sending any 
element of the form ({s}, b) to s; hence, h can be factored as the composition of 
the homomorphism cb : 'P(B) x B — > £(B) and a homomorphism l : £(B) — > B 
such that Z({s},*) = s (for all s £ B). It follows from Theorem 1 that the LAAC 
algorithm is a solution procedure for CSP(B). □ 

One way to view the three identities given in the statement of Theorem 7 is 
that they require, for each fixed b £ B, that the binary operation g b : B 2 — » B 
defined by g b (x, y) = g{ x, y , b) is a semilattice operation. The proof of Theorem 7 
uses the algebraic characterization of LAAC given by Theorem 1. The relational 
structures identified by Theorem 7 constitute a new tractable class, described 
using the notion of invariance, that has not been previously observed. 

We can derive the following corollaries from Theorem 7. 

Corollary 8. Let d : B 3 — > B be the dual discriminator on B, that is, the 
function such that d{x,y,z ) is equal to x if x = y, and z otherwise. For any 
relational structure B ( with universe B ) invariant under d, the LAAC algorithm 
is a solution procedure for CSP(B). 

The dual discriminator is an example of a near-unanimity operation ; invari- 
ance under a near-unanimity operation has previously been demonstrated to 
imply CSP(B) tractability [13]. Another example of a near-unanimity operation 
that can be shown to imply tractability by Theorem 7 is as follows. 
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Corollary 9. Let B = {1, . . - , fe}, and let median : B 3 — > B be the ternary 
function on B which returns the median of its arguments. For any relational 
structure B (with universe B) invariant under median, the LAAC algorithm is a 
solution procedure for CSP(B). 

The next theorem demonstrates that relational structures for which LAAC is a 
decision procedure can be combined together to give further relational structures 
also solvable by LAAC. 

Theorem 10. Suppose that B i, . . . , Bj~ are sets, none of which contain T as an 
element, and suppose that l\ : L(Bi) — * B±, . . ■ ,lk ■ L(Bff) — * B ^ are functions 
such that (for alii = 1 it holds that (1) Zj({6j},*) = hi, for all 6; € B it 

and (2) if BiP\Bj ^ 0, then the restrictions of h and lj to L(BiC\Bj) are equal, 
for all j = 1 ,k. Define B to be the set (Ut=i -®») U{T}, and let l : L{B) — » B 
be the function defined by 

— l(S,b) = sif(S,b) = ({s},*), 

— l(S , 6) = li(S, b) if |5| > 1 and S U {b} C Bi for some i, and 

— l(S, b) = T otherwise. 

For any relational structure B (with universe B) such that l is a homomor- 
phism from £(B) to B, the LAAC algorithm is a solution procedure for CSP(B). 
Moreover, for any relational structure B, (with universe Bi) such that li is a 
homomorphism from £(B.j) to B^, l is also a homomorphism from £(B;) to Bj. 

5 Commutative Conservative Operations 

We say that a binary operation • over the set B is commutative conservative if 
for all a,b G B, it holds that a»b= b»a; and, for all a,b € B, it holds that a»b € 
{a,b}. As mentioned in the introduction, relational structures invariant under 
a commutative conservative operation have been demonstrated to be tractable 
[8] , via an algorithm that is not well-characterized (in the sense discussed in the 
introduction). In this final section, we utilize the algebraic technology developed 
in Section 3 to give a precise classification of those commutative conservative 
operations whose invariant structures are tractable by the SLAAC algorithm, 
which as we have shown, is well-characterized. 

Our classification is stated in terms of a “forbidden configuration” . In partic- 
ular, we show that the commutative conservative groupoids ( B , •) tractable via 
SLAAC are precisely those that do not contain a forbidden subalgebra F 3 . Define 
F to be the commutative conservative groupoid having universe {0, 1,2,3} and 
the following Cayley table. 





0 12 3 


0 


0 12 3 


1 


112 1 


2 


2 2 2 3 


3 


3 13 3 



3 In this paper, a commutative conservative groupoid is simply a set endowed with a 
commutative conservative binary operation. 
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The groupoid F can be regarded as the so-called stone-scissors-paper algebra 
with a bottom element adjoined. 

Theorem 11. Let(B,») be a commutative conservative groupoid. The following 
two statements are equivalent: 

— The groupoid (£?,•) does not have a subalgebra isomorphic to F. 

— For every relational structure B (with universe B) invariant under • , the 
SLAAC algorithm is a solution procedure for CSP(B). 
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Abstract. The success of stochastic algorithms is often due to their ability to 
effectively amplify the performance of search heuristics. This is certainly the 
case with stochastic sampling algorithms such as heuristic-biased stochastic sam- 
pling (HBSS) and value-biased stochastic sampling (VBSS), wherein a heuristic 
is used to bias a stochastic policy for choosing among alternative branches in the 
search tree. One complication in getting the most out of algorithms like HBSS 
and VBSS in a given problem domain is the need to identify the most effective 
search heuristic. In many domains, the relative performance of various heuristics 
tends to vary across different problem instances and no single heuristic domi- 
nates. In such cases, the choice of any given heuristic will be limiting and it 
would be advantageous to gain the collective power of several heuristics. Toward 
this goal, this paper describes a framework for integrating multiple heuristics 
within a stochastic sampling search algorithm. In its essence, the framework uses 
online-generated statistical models of the search performance of different base 
heuristics to select which to employ on each subsequent iteration of the search. 
To estimate the solution quality distribution resulting from repeated application 
of a strong heuristic within a stochastic search, we propose the use of models 
from extreme value theory (EVT). Our EVT-motivated approach is validated on 
the NP-Hard problem of resource-constrained project scheduling with time win- 
dows (RCPSP/max). Using VBSS as a base stochastic sampling algorithm, the 
integrated use of a set of project scheduling heuristics is shown to be competi- 
tive with the current best known heuristic algorithm for RCPSP/max and in some 
cases even improves upon best known solutions to difficult benchmark instances. 



1 Introduction 

The success of stochastic sampling algorithms such as Heuristic-Biased Stochastic 
Sampling (HBSS) [1] and Value-Biased Stochastic Sampling [2,3] stems from their 
ability to amplify the performance of search heuristics. The essential idea underlying 
these algorithms is to use the heuristic’s valuation of various choices at a given search 
node to bias a stochastic decision, and, in doing so, to randomly perturb the heuristic’s 
prescribed (deterministic) trajectory through the search space. In the case of HBSS, a 
rank-ordering of possible choices is used as heuristic bias; in VBSS, alternatively, the 
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actual heuristic value attributed to each choice is used. This stochastic choice process 
enables generation of different solutions on successive iterations (or restarts), and ef- 
fectively results in a broader search in the “neighborhood” defined by the deterministic 
heuristic. HBSS has been shown to significantly improve the performance of a heuris- 
tic for scheduling telescope observations [1]. VBSS has shown similar ability to im- 
prove search performance in weighted-tardiness scheduling [2, 3], resource-constrained 
project scheduling [3], and in a multi-rover exploration domain [4]. 

The drawback to stochastic sampling algorithms such as HBSS and VBSS is that 
they require identification of an appropriate domain heuristic, and search performance 
is ultimately tied to the power of the heuristic that is selected. Heuristics, however, are 
not infallible, and in most domains there does not exist a single dominating heuristic. 
Instead different heuristics tend to perform better or worse on different problem in- 
stances. In such cases, the choice of any single heuristic will ultimately be limiting and 
it would be advantageous to gain the collective power of several heuristics. The idea 
of exploiting a collection of heuristics to boost overall performance has been explored 
in other search contexts. Allen and Minton use secondary performance characteristics 
as indicators for which heuristic algorithm is performing more effectively for a given 
CSP instance [5]. Others have been applying relatively simple learning algorithms to 
the problem of selecting from among alternative local search operators [6, 7], Work on 
algorithm portfolios [8] and the related A-Teams framework [9] take a more aggressive 
approach, executing several different heuristic search algorithms in parallel. 

In this paper, we consider the problem of integrating multiple search heuristics 
within a stochastic sampling algorithm. Rather than carefully customizing a variant 
to use a composite heuristic, the approach taken here is to instead design a search con- 
trol framework capable of accepting several search heuristics and self-customizing a 
hybrid algorithm on a per problem instance basis. Generally speaking, our approach 
views each solution constructed by the stochastic sampling algorithm as a sample of 
the expected solution quality of the base heuristic. Over multiple restarts on a given 
problem instance, we construct solution quality distributions for each heuristic, and 
use this information to bias the selection of heuristic on subsequent iterations. Gomes 
et al.’s analysis and use of runtime distributions has led to much success in constraint 
satisfaction domains [10]. In a similar way, a hypothesis of this paper is that solution 
quality distributions can provide an analagous basis for understanding and enhancing 
the performance of stochastic sampling procedures in solving optimization problems. 

As suggested above, the search control framework developed in this paper uses 
online-generated statistical models of search performance to effectively combine mul- 
tiple search heuristics. Consider that a stochastic search algorithm samples a solution 
space, guided by a strong domain heuristic. Our conjecture is that the solutions found 
by this algorithm on individual iterations are generally “good” (with respect to the over- 
all solution space) and that these “good” solutions are rare events. This leads us to the 
body of work on extreme value theory (EVT). EVT is the statistical study of rare or 
uncommon events, rather than the usual (e.g., study of what happens at the extreme of 
a distribution in the tail). With our conjecture stated and with respect to EVT, we can 
view the distribution of solution qualities found by a stochastic heuristic search algo- 
rithm as a sort of snapshot of the tail of the distribution of solution qualities of the over- 
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all search space. We specifically employ a distribution called the Generalized Extreme 
Value (GEV ) distribution to model the solution qualities given by individual iterations 
of the search algorithm. We implement this EVT-motivated approach in two ways: 1 ) 
using a well-known, but numerically intensive computation of a maximum likelihood 
estimation of the GEV; and 2) using kernel density estimation tuned assuming a GEV. 

Using VBSS as a base stochastic sampling procedure, we validate this EVT- 
motivated heuristic performance modeling and heuristic selection policy on the 
NP-Hard problem of resource-constrained project scheduling with time windows 
(RCPSP/max). As a baseline and to validate our EVT assumptions, we compare the 
performance of the approach with one that makes the naive assumption that the solution 
qualities are normally distributed. We further benchmark the approach against several 
well-known heuristic algorithms, finding that our EVT-motivated algorithm is competi- 
tive with the current best known heuristic algorithm for RCPSP/max; and in some cases 
even improves upon current best known solutions to difficult problem instances. 

2 Modeling a Solution Quality Distribution 

2.1 Extreme Value Theory Motivation 

Consider that the solutions to a hard combinatorial optimization problem computed on 
each iteration of a stochastic sampling algorithm are in fact at the extreme when the 
overall solution-space is considered. If one were to sample solutions uniformly at ran- 
dom, the probability is very low that any of the solutions generated by a stochastic 
search that is guided by a strong heuristic would be found. In other words, good solu- 
tions to any given problem instance from the class of problems of greatest interest to us 
are, in a sense, rare phenomena within the space of feasible solutions. 

For example, using Bresina’s concept of a quality density function (QDF) [11], we 
examined several problem instances of a weighted tardiness scheduling problem. A 
QDF is the distribution of solution qualities that one would obtain by sampling uni- 
formly from the space of possible solutions to a problem instance. For easy problem 
instances from our problem set, we found that the optimal solution was on average over 
6.4 standard deviations better than the average feasible solution to the problem; the 
value of the average solution given by a stochastic sampler guided by a strong domain 
heuristic was also over 6.4 standard deviations better than the average solution in the 
problem space [3]. Further, for hard problem instances, we found that the average solu- 
tion given by a single iteration of the stochastic search was over 9. 1 standard deviations 
better than the average random solution; the best known solution was on average 9.4 
standard deviations better than the average solution in the problem space [3]. 

2.2 Generalized Extreme Value Distribution 

With this noted, we turn to the field of extreme value theory, which deals with “tech- 
niques and models for describing the unusual rather than the usual” [12]. Consider an 
extreme value analog to the central limit theory. Let M n = max{Xi, . . . , X n } where 
Xi, . . . , X n is a sequence of independent random variables having a common distribu- 
tion function F. For example, perhaps the X t are the mean temperatures for each of the 
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365 days in the year, then M n would correspond to the annual maximum temperature. 
To model M n , extreme value theorists turn to the extremal types theorem [12]: 

Theorem 1. If there exists sequences of constants {a n > 0} and {b n } such that 
P{{M n — b n )/a n < z) —■ > G(z ) as n — > oo, where G is a non-degenerate distribution 
function , then G belongs to one of the following families: 

I: G{z ) = exp(— exp(— ( £ ^))), — oo < 2 < oo 
II: G(z) = exp(— if z > b and otherwise G(z) = 0 
III: G(z ) = exp(( £ ^) Q ) if z < b and otherwise G(z ) = 1 

for parameters a > 0, b and in the latter two cases a > 0. 

These are known as the extreme value distributions, types I (Gumbel), II (Frechet), 
and III (Weibull). The types II and III distributions are heavy-tailed - one bounded on 
the left and the other on the right. The Gumbel distribution is medium-tailed and un- 
bounded. These distributions are commonly reformulated into the generalization known 
as the generalized extreme value distribution (GEV): 

G(*)=exp(-(l + £( — j)" 1 /*) (1) 

a 

where {z : 1 + > 0}, — oo < b < oo, a > 0, and — oo < £ < oo. The case 

where £ = 0 is treated as the limit of G(z) as £ approaches 0 to arrive at the Gumbel 
distribution. Under the assumption of Theorem 1, P((M ra — b n )/a n < z) « G(z) for 
large enough n which is equivalent to P(M n < z) ss G((z — b n )/a n ) = G*(z) where 
G* (z) is some other member of the generalized extreme value distribution family. 

The main point here is that to model the distribution of the maximum element of 
a fixed-length sequence (or block) of identically distributed random variables (i.e., the 
distribution of “block maxima”), one needs simply to turn to the GEV distribution re- 
gardless of the underlying distribution of the individual elements of the sequence. 

2.3 Modeling Solution Quality via the GEV 

Theorem 1 only explicitly applies to modeling the distribution of “block maxima”. The 
assumption we now make is that the quality distribution for a stochastic sampling al- 
gorithm using a strong heuristic to sample from the solution space behaves the same 
as (or at least similar to) the distribution of “block maxima” and thus its cumulative 
distribution function can be modeled by the GEV distribution. 

To use the GEV as our model, we must first recognize that we have been assuming 
throughout that our objective function must be minimized so we need a “block minima” 
analog to Equation 1. Let M' n = min{2fi, . . . , X n }. We want P(M! n < z). Let M" = 
max{— Xi, . . . , — X n }. Therefore, M' n = — M" and P(M’ n < z) = P{—M " < z) = 
P{M” > —z) = 1 — P(M” < —z). Therefore, assuming that the distribution function 
behaves according to a GEV distribution, the probability P t of finding a better solution 
than the best found so far ( B ) using heuristic i can be defined as: 

Pi = 1 - G^B) = 1 - exp(— (1 + &( ~' B ~ 6< ))~ 1/f «) 

Cli 



( 2 ) 
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where the bi, a», and are estimated from the negative of the sample values. To com- 
pute these parameters, we use Hosking’s maximum-likelihood estimator of the GEV 
parameters [13]. In estimating the GEV parameters, Hosking’s algorithm is called mul- 
tiple times, if necessary. The first call uses initial estimates of the parameters as recom- 
mended by Hosking (set assuming a Gumbel distribution). If Hosking’s algorithm fails 
to converge, then a fixed number of additional calls are made with random initial val- 
ues of the parameters. If convergence still fails, we use the values of the parameters as 
estimated by assuming a type I extreme value distribution (the Gumbel distribution) 1 . 

2.4 Modeling Solution Quality Using Kernel Density Estimation 

A second possibility for estimating the quality distribution is Kernel Density Estimation 
(see [ 14]). A kernel density estimator makes little, if any, assumptions regarding the un- 
derlying distribution it models. It provides a non-parametric framework for estimating 
arbitrary probability densities. The advantage of this approach is that it should be pos- 
sible to more closely estimate arbitrary solution quality distributions. Kernel density 
estimation takes local averages to estimate a density function by placing smoothed out 
quantities of mass at each data point. The kernel density estimator is defined as: 

1 " _ y 

<3) 

i—1 

K(-) is a kernel function and h is called the bandwidth (also sometimes called the scale 
parameter or spreading coefficient). The Xi are the n sample values (objective function 
value of the solutions generated by the n iterations of the stochastic search algorithm). 
The kernel function we have chosen is the Epanechnikov kernel [15]: 

K(x) = ^-(1 — for |x| < VE and otherwise 0 . (4) 

Epanechnikov showed that this is the risk optimal kernel, but estimates using other 
smooth kernels are usually numerically indistinguishable. Thus, the form of the ker- 
nel can be chosen to best address computational efficiency concerns. In our case, the 
Epanechnikov kernel is a clear winner computationally for the following reasons: 

- We are most interested in ultimately computing the probability of finding a better 
solution than the best found so far. This kernel function allows us to easily compute 
the cumulative probability distribution for arbitrary solution quality distributions. 

- Due to the condition \x\ < y/5, only a limited number of sample values must be 
considered, reducing the computational overhead. 

Although the choice of kernel function is not critical in terms of numerical re- 
sults, the choice of bandwidth can be very crucial. Epanechnikov showed that the 
optimal choice of bandwidth is h — (-^j) 1 / 5 , where L = K(x) 2 dx where 
M = C 00 U" (x)) 2 dx, and where n is the number of samples [15]. Unfortunately, this 

1 It should be noted that this fallback condition appears to rarely occur, if ever, in our experi- 
ments. 
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computation depends on knowing the true distribution (M depends on /( x)). We as- 
sume that the underlying distribution is the Gumbel distribution. The reasoning behind 
this assumption follows our EVT motivation. Given the Gumbel distribution assump- 
tion, M = -£- 5 , where a is the scale parameter of the Gumbel. Note that the standard 
deviation of the Gumbel distribution is a = 2™ [161. From this, we have a = . 

v6 ?r 

5 

We can now write M in terms of the sample standard deviation: M = — = — . We are 
using the Epanechnikov kernel so L = This results in a value of h computed as: 

h = 0.79 sn -1 / 5 where s = min{cr, <5/1.34} and where Q is the interquartile range. 

We are interested in the cumulative distribution function for the purpose of comput- 
ing the probability of finding a better solution than the best found so far. This can be 
obtained from integrating the kernel density estimator. Thus, we have the probability P, 
of finding a solution better than the best found so far, B , given heuristic i 2 : 



P = 




. V" T<( 



x - Si 



( 5 ) 



Given our choice of the Epanechnikov kernel, this evaluates to: 



Pi = 



Anihi^/b 



E 



3, \^-\<V5 



(S itj - hiVs) 2 Si j + (Si,j - hiVS)Sf tj ))) 



(6) 



It should be noted, that if we maintain the samples in sorted order, then given that 
B must be less than or equal to the smallest value in this list 3 , we compute this sum 
until we reach a sample ,S/. ? such that | B \ > \/5. Once a sample for which this 
condition holds is reached in the list, the summation can end. Actually, rather than in 
a sorted list, we maintain the samples in a sorted histogram, maintaining counts of the 
number of samples with given discrete values. 



2.5 Selecting a Search Heuristic 

A method is needed for balancing the tradeoff between exploiting the current estimates 
of the solution quality distributions given by the algorithm’s choices and the need for 
exploration to improve these estimates. The fc-armed bandit focuses on optimizing the 
expected total sum of rewards from sampling from a multiarmed slot machine. At any 
point during an iterative stochastic sampling search, there is a current best found so- 
lution. Future iterations have the objective of finding a solution that is better than the 

2 This assumes a minimization problem with lower bound of 0 on the objective function. 

3 Assuming we are minimizing an objective function. 
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current best. From this perspective, a better analogy than the fc-armed bandit would be 
to consider a multiarmed slot machine in which the objective is to sample the arms to 
optimize the expected best single sample - what we have termed the “Max /c-Armed 
Bandit Problem” [3], Elsewhere, we showed that the optimal sampling strategy sam- 
ples the observed best arm at a rate that increases approximately double exponentially 
relative to the other arms [3], 

Specific to our problem, this means sampling with the observed best heuristic with 
frequency increasing double exponentially relative to the number of samples given the 
other heuristics. Consider, as the exploration strategy, Boltzmann exploration as com- 
monly used in reinforcement learning [17] and simulated annealing [18]. With a Boltz- 
mann exploration strategy, we would choose to use heuristic hi with probability P(hi)\ 



p{ = «P((P,F.)/D 

Ef.iexp((P,F,)/r) 



(7) 



where Pi is the probability of finding a solution better than the best found so far, where 
Fi is the ratio of the number of feasible solutions used in estimating Pi to the total 
number of samples with i, where there are H heuristics to choose from, and where T is 
a temperature parameter. To get the double exponential sampling increase, we need to 
decrease T exponentially. For example, let T — exp(— N') where N' is the number of 
samples already taken and sample hi with probability: 

= exp((P,F,)/ e xp(-A")) 

Ef„exp((FF,)/exp(-«')) 



3 Experimental Design 

In this Section, consider the resource constrained project scheduling problem with time 
windows (RCPSP/max). RCPSP/max is the RCPSP with generalized precedence rela- 
tions between start times of activities. It is a difficult makespan minimization problem 
well studied by the Operations Research community. Finding feasible solutions to in- 
stances of the RCPSP/max is NP-Hard, making the optimization problem very difficult. 

RCPSP/max Formalization. The RCPSP/max problem can be defined formally as fol- 
lows. Define P =< A, A, R > as an instance of RCPSP/max. Let A be the set of 
activities A = {ao, at, 02 , • • • , a n , a„+ 1 }. Activity ao is a dummy activity represent- 
ing the start of the project and a n+ i is similarly the project end. Each activity Uj has 
a fixed duration pj , a start-time ,Sj , and a completion-time Gj which satisfy the con- 
straint Sj + pj = Cj . Let A be a set of temporal constraints between activity pairs 
< a,; . dj > of the form Sj — S t £ [T™ m , T"‘ ax ]. The A are generalized precedence 
relations between activities. The T™ m and T™ ax are minimum and maximum time-lags 
between the start times of pairs of activities. Let R be the set of renewable resources 
R = {ri,r 2 , . . . r m }. Each resource Vk has an integer capacity Ck > 1. Execution 
of an activity aj requires one or more resources. For each resource rp, the activity a :j 
requires an integer capacity rcj } k for the duration of its execution. An assignment of 
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start-times to the activities in A is time-feasible if all temporal constraints are satisfied 
and is resource-feasible if all resource constraints are satisfied. A schedule is feasible 
if both sets of constraints are satisfied. The problem is then to find a feasible schedule 
with minimum makespan M where M (S) = max{C)}. That is we wish to find a set of 
assignments to S such that = arg 110119 M (S). The maximum time-lag constraints 
are what makes this problem especially difficult. Particularly, due to the maximum time- 
lag constraints, finding feasible solutions alone to this problem is NP-Hard. 

Branch- and- Bound Approaches. There are many branch-and-bound approaches for the 
RCPSP/max problem. Though for many problem instances it is too costly to execute a 
branch-and-bound long enough to prove optimality, good solutions are often obtained 
in a reasonable amount of computation time through truncation (i.e., not allowing the 
search to run to completion). The current (known) best performing branch-and-bound 
approach is that of Dorndorf et al. [19] (referred to later as B&Bcppgg). 

Priority-Rule Methods. It should be noted that a priority rule method, as referred to 
here, is not the same as a dispatch policy. It actually refers to a backtracking CSP search 
that uses one or more priority-rules (dispatch heuristics) to choose an activity to sched- 
ule next, fixing its start time variable. The RCPSP/max is both an optimization problem 
and a CSP. When a start time becomes fixed, constraint propagation then takes place, 
further constraining the domains of the start time variables. The specific priority-rule 
method that we consider here is referred to as the “direct method” with “serial schedule 
generation scheme” [20, 21 ]. Franck et al. found the direct method with serial genera- 
tion scheme to perform better in general as compared to other priority-rule methods. 

The serial schedule generation scheme requires a priority-rule or activity selection 
heuristic. There are a wide variety of such heuristics available in the literature. Neu- 
mann et al. recommend five in particular. These five heuristics are those that we later 
randomize and combine within a single stochastic search: 

- LST: smallest “latest start time” first: LST; = \+ls- • 

- MST: “minimum slack time” first: MST, = 1+LS 1 _ ES . ■ 

- MTS: “most total successors” first: MTS,; = | Successors^, where Successors, is 
the set of not necessarily immediate successors of a, in the project network. 

- LPF: “longest path following” first: LPF, = lpath(i, n + 1), where lpath(i, n + 1) 
is the length of the longest path from a, to a n +i. 

- RSM: “resource scheduling method”: 

RSMi = l+max(0,max g6eligible ^ ^ (ES i+Pi ~LS g )) ' 

LSi and ES, in these heuristics refers to the latest and earliest start times of the ac- 
tivities. Note that we have rephrased a few of these heuristics from Neumann et al.’s 
definitions so that for each, the eligible activity with the highest heuristic value is cho- 
sen. The eligible set of activities are those that can be time-feasibly scheduled given 
constraints involving already scheduled activities. 

Truncating the search when a threshold number of backtracks has been reached 
and restarting with a different heuristic each restart has been proposed as an efficient 
and effective heuristic solution procedure. Later, we refer to the following multiple run 
truncated priority methods: 
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- PRfnS5 '■ Executing the direct method with serial generation scheme 5 times, once 
with each of the heuristics described above, and taking the best solution of the 
5 as originally suggested by Frank et al. [20,21]. Results shown later are of our 
implementation. 

- PR_Fjvicr Similarly, this is a best of 10 heuristics. The results shown later are as 
reported by Dorndorf et al. [19] and Cesta et al. [22] of Franck and Neumann’s best 
of 10 heuristics method [23] 4 . 

Iterative Sampling Earliest Solutions. Cesta et al. present an algorithm for RCPSP/max 
that they call Iterative Sampling Earliest Solutions (ISES) [22], ISES begins by finding 
a time feasible solution with a maximum horizon (initially very large) on the project’s 
makespan, assuming one exists. The resulting time-feasible solution, for any interest- 
ing problem instance, is generally not resource-feasible. ISES proceeds by iteratively 
“leveling” resource-constraint conflicts. That is, it first detects sets of activities that 
temporally overlap and whose total resource requirement exceeds the resource capac- 
ity. Given the set of resource-constraint conflicts, it chooses one of the conflicts using 
heuristic-equivalency (i.e., chooses randomly from among all resource-conflicts within 
an “acceptance band” in heuristic value from the heuristically preferred choice). It then 
levels the chosen conflict by posting a precedence constraint between two of the activi- 
ties in the conflicted set. It continues until a time-feasible and resource-feasible solution 
is found or until some resource-conflict cannot be leveled. This is then iterated some 
fixed number of times within a stochastic sampling framework. Then, given the best so- 
lution found during the the stochastic sampling process, the entire algorithm is repeated 
iteratively for smaller and smaller horizons. Specifically, the horizon is repeatedly set to 
the makespan of the best solution found so far until no further improvement is possible. 
Cesta et al. show ISES to perform better than the previous best heuristic algorithm for 
the RCPSP/max problem (namely PR^vui). 

Performance Criteria. The set of benchmark problem instances that we use in the ex- 
perimental study of this Section is that of Schwindt 5 . There are 1080 problem 
instances in this problem set. Of these, 1059 have feasible solutions and the other 21 
are provably infeasible. Each instance has 100 activities and 5 renewable resources. In 
the experiments that follow, we use the following performance criteria which have been 
used by several others to compare the performance of algorithms for the RCPSP/max 
problem: 

- Alb'- the average relative deviation from the known lower bound, averaged across 
all problem instances for which a feasible solution was found. Note that this is 
based on the number of problem instances for which the given algorithm was able 
to find a feasible solution and thus might be based on a different number of problem 
instances for each algorithm compared. This criteria, as defined, is exactly as used 
by all of the other approaches to the problem available in the literature. 

4 Franck and Neumann’s technical report describing this best of 10 strategy is no longer available 
according to both the library at their institution as well as the secretary of their lab. We have 
been unable to find out what the 10 heuristics are that produce these results. 

5 http://www.wior. uni-karlsruhe. de/ 

LSJSleumann/Forschung/ProGenMax/rcpspmax.html 
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- NO: the number of optimal solutions found. Currently, there are known optimal 
solutions for 789 of the 1080 problem instances. 

- NF: the number of feasible solutions found. Of the 1080 problem instances, 1059 
possess at least one feasible solution. The other 21 can be proven infeasible (e.g., 
by the preprocessing step of the priority-rule method). 

- TIME: CPU time in seconds. 

For all stochastic algorithms, values shown are averages across 10 runs. Values in paren- 
theses are best of the 10 runs. In the results, as an added comparison point, we list the 
above criteria for the current best known solutions as BEST. Note that BEST is the best 
known prior to the algorithms presented in this paper. We further improve upon the best 
known solutions to some of the problem instances, but this is not considered in BEST. 

Value-Biased Stochastic Sampling (VBSS). The first part of our approach uses an algo- 
rithm called VBSS [3, 2] to randomize the priority-rule method. Rather than following a 
priority rule deterministically during the course of the search, we bias a stochastic selec- 
tion process by a function of the heuristic values. The backtracking priority-rule method 
is truncated as before when a threshold number of backtracks has been reached; and 
then restarted some number of times. The best feasible solution found of these restarts 
is chosen. In the results that follow, we refer to using the stochastic sampling framework 
VBSS within the priority-rule method for TV iterations by: LST[N]; MST[N]; MTS[N]; 
LPF[N]; and RSM[N], The bias functions used within VBSS are in each case polyno- 
mial: degree 10 for each of LST and MST; degree 2 for MTS; degree 3 for LPF; and 
degree 4 for RSM. These were chosen during a small number of exploratory solution 
runs for a small sample of problem instances. NAIVE[N] refers to randomly sampling 
an equal number of times with each of the five heuristics (TV iterations total). 

Generating and Using Models of Solution Qualities. Further, using the methods of 
modeling the distribution of solution qualities presented in this paper, we enhance the 
performance of the VBSS priority-rule method, effectively combining multiple heuris- 
tics within a single multistart stochastic search. We refer to this approach using the 
above five heuristics for TV iterations according to the estimation method as follows: 
NORM[N] using Normal distribution estimates; KDE[N] using kernel density esti- 
mates; and GEV[N] using GEV distribution estimates. 

4 Experimental Results 

Table 1 shows a summary of the results of using VBSS with the priority-rule method 
and Table 2 shows a summary of the results of generating and using models of solution 
quality to enhance the search. We can make a number of observations: 

- For any number of iterations of the VBSS enhanced priority-rule method, the best 
single heuristic to use in terms of finding optimal solutions is the “longest-path fol- 
lowing first” (LPF) heuristic. However, we can also observe that the VBSS method 
using LPF is worst in terms of the number of feasible solutions found. Using LPF 
and VBSS appears to perform very well on the problem instances for which it can 
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Table 1. Summary of the results of using VBSS with the priority-rule method. 



Algorithm 



NO 



NF TIME 



LPF[20] 

LST[20] 
MTS [20] 
MST[20] 
RSM[20] 



4.4 (4.2) 
6.1 (5.7) 

4.6 (4.4) 

6.6 (6.1) 

8.5 (7.7) 



616 (628) 

600.7 (612) 
600 (617) 

598.3 (606) 

447.7 (494) 



942 (956) 
1041 (1043) 
953.3 (965) 
1038.7 (1041) 
1027.3 (1031) 



0.4 

0.2 

0.4 

0.3 

0.2 



LPF[100] 

MTS [100] 
LST[100] 
MST[100] 
RSM[100] 



4.2 (4.0) 

4.4 (4.3) 

5.5 (5.2) 
5.9 (5.6) 
7.4 (6.9) 



632.3 (642) 
626 (638) 

617.3 (625) 
609 (614) 

510.3 (536) 



959.7 (969) 
970 (981) 
1044 (1044) 
1042 (1043) 
1033.3 (1035) 



1.1 

1.1 

0.6 

0.8 

0.6 



LPF[200] 

MTS [200] 
LST[200] 
MST[200] 
RSM[200] 



4.1 (4.0) 

4.3 (4.2) 

5.3 (5.1) 
5.7 (5.5) 

7.1 (6.5) 



638.7 (647) 
634(648) 

625.3 (633) 

614.3 (623) 

529.7 (555) 



965 (974) 
979 (986) 

1044.3 (1045) 

1043.3 (1044) 
1034.7 (1036) 



2.1 

2.0 

1.1 

1.5 

1.0 



LPF[400] 

MTS [400] 
LST[400] 
MST[400] 
RSM[400] 



4.1 (4.0) 
4.3 (4.2) 

5.2 (4.9) 
5.5 (5.3) 
6.7 (6.3) 



643.3 (650) 
641.7 (654) 
631 (638) 
619 (629) 
544 (564) 



972.3 (980) 
983.7 (989) 

1044.7 (1045) 

1043.7 (1045) 

1035.7 (1037) 



4.0 

3.7 
1.9 

2.8 
1.7 



find feasible solutions, while at the same time having difficulties finding any feasi- 
ble solution for a large number of other problem instances. 

We observe similar behavior when the second best heuristic in terms of number 
of optimal solutions found (VBSS using the “most total successors” (MTS)) is 
considered. However, like VBSS using LPF, VBSS using MTS performs poorly 
in terms of finding feasible solutions to the problems of the benchmark set. 
Although VBSS using any of the other three heuristics does not perform as well in 
terms of finding optimal solutions as compared to using LPF or MTS, using these 
other heuristics allows the search to find feasible solutions for many more of the 
problem instances as compared to using only LPF or MTS. Thus, we can see that by 
combining the five heuristics either by the naive strategy or by using quality models, 
that we can find feasible solutions to nearly all of the 1059 problem instances on 
average ; while at the same time combining the strengths of the individual heuristics 
in terms of finding optimal, or near-optimal, solutions. 

Comparing the use of quality models to guide the choice of search heuristic to the 
naive strategy of giving an equal number of iterations to each of the heuristics, we 
see that the naive strategy is always the worst in terms of finding optimal solutions. 
Somewhat more interestingly, it is also always worst in terms of CPU time. Despite 
the overhead required for estimating the solution quality models, the naive strategy 
appears to be generally slower - as much as 2.5 seconds slower in the 2000 iteration 
case. The reason for this is that although there is extra computational overhead in 
the modeling, using the models gives less iterations to heuristics that appear less 
likely to find feasible solutions. The naive strategy results in more iterations that do 




208 



Vincent A. Cicirello and Stephen F. Smith 



Table 2. Summary of the results of using VBSS and models of solution quality to enhance the 
priority-rule method. 



Algorithm 



NO 



NF TIME 



GEVriOO] 

KDE[100] 
NORM[100] 
NAIVE [100] 



5.3 (4.9) 
5.3 (4.9) 
5.3 (4.9) 
5.3 (5.0) 



650.7 (667) 

649.7 (662) 

648.7 (661) 
646.3 (650) 



1050.7 

1050.7 

1050.7 

1050 



(1053) 

(1053) 

(1053) 

(1052) 



0.8 

0.8 

0.8 

0.9 



KDE[500] 


4.8 (4.6) 665.7 (680) 


1053 (1055) 


3.1 


NORM[500] 


4.9 (4.6) 662.3 (673) 


1053 (1055) 


3.0 


GEV[500] 


4.9 (4.6) 660 (677) 


1053 (1055) 


3.2 


NAIVE[500] 


4.8 (4.6) 658.3 (666) 1052.7 (1055) 


3.7 



KDE[1000] 

GEV[1000] 
NORM [1000] 
NAIVE [1000] 



4.7 (4.5) 

4.8 (4.5) 
4.8 (4.5) 
4.7 (4.5) 



670.3 (683) 
667(682) 

666.7 (678) 

664.7 (673) 



1054.7 

1054.7 

1054.7 

1054.7 



(1057) 

(1057) 

(1057) 

(1057) 



5.8 

6.5 

5.8 

7.0 



KDE[2000] 

NORM[2000] 

GEV[2000] 

NAIVE[2000] 



4.6 (4.4) 

4.7 (4.4) 
4.7 (4.4) 
4.6 (4.4) 



675.7 (689) 
672.3 (685) 
672.3 (685) 

669.7 (678) 



1057 

1057 

1057 

1057 



(1059) 

(1059) 

(1059) 

(1059) 



11.2 

11.0 

13.0 

13.5 



not find a feasible solution, thus performing the maximum number of backtracking 
steps allowed by the serial generation scheme for such infeasible iterations. 

- Of the three methods for estimation, kernel density estimation performs best for the 
RCPSP/max problem. Except for N = 100, KDE[N] finds more optimal solutions 
than the other considered methods. Furthermore, KDE[N] requires significantly 
less CPU time than does GEV[N] (at least for the particular estimation procedure 
of the GEV distribution employed here). Also, the additional overhead of KDE[N] 
compared to NORM[N] appears to be negligible given the CPU timing results. 

Table 3 lists the results of a comparison of the enhanced priority-rule method and 
other algorithms, including branch-and-bound approaches and stochastic search algo- 
rithms. We can make the following observations: 

- The best performing heuristic method is clearly KDE[N], In approximately 1/6 
of the CPU time used by the previous best performing heuristic method - ISES 
- KDE[1000] finds as many optimal solutions with a significantly lower average 
deviation from the known lower bounds. In less than 1/3 of the CPU time re- 
quired by ISES, KDE[2000] consistently finds as many feasible solutions as ISES; 
KDE[2000] consistently finds more optimal solutions than ISES; and KDE[2000] 
on average finds solutions that deviate significantly less from the known lower 
bounds as compared to ISES. 

- In approximately 1/6 of the CPU time, KDE[2000] on average performs as well as 
the current best branch-and-bound algorithm - B&B o p pg$ - in terms of deviation 
from lower bounds (and better than B&B dpp'm for the best run of KDE[2000]). 
However, B&B nri'm finds more optimal solutions than KDE[2000]. KDE[2000] 
is a competitive alternative to truncated branch-and-bound if one requires good 
solutions but not necessarily optimal solutions in a highly limited amount of time. 
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Table 3. Comparison of the enhanced priority-rule method with other algorithms for the 
RCPSP/max problem. 



Algorithm 


4\ls 


NO 


NF TIME 


BEST 


3.3 


789 


1059 


- 


B&BdPP98 


4.6 


774 


1059 


66. 7 a 


PRfjVS5 


6.5 


603 


991 


0.2 


PRpjvio 


7.7 


601 


1053 


n/a c 


ISES 


8.0 (7.3) 669.8 (683) 


1057 (1059) 


35. 7 b 


KDE[1000] 4.7 (4.5) 670.3 (683) 1054.7 (1057) 


5.8 


KDE[2000] 4.6 (4.4) 675.7 (689) 


1057 (1059) 


11.2 



a Adjusted from original publication by a factor of . 

The branch-and-bound algorithm was implemented 
on a 200 Mhz Pentium, while we used for our algo- 
rithms a Sun Ultra 10 / 300MHz. 
b Adjusted from original publication by a factor of ||| . 

ISES was originally implemented on a Sun Ultra- 
Sparc 30 / 266 MHz, while we used for our algo- 
rithms a Sun Ultra 10 / 300MHz. 
c Timing results were not available in some cases. This 
is indicated by “n/a”. 

Table 4. New best known solutions found by the algorithms of this paper. LB is the lower bound 
for the makespan. Old is the previous best known. New is the new best known. 



Instance 


LB Old New 


Algorithm(s) 


C364 


341 372 


365 


MTS [100] 


D65 


440 539 


521 KDE[2000], GEV[2000] 


D96 


434 450 


445 


LPF[20] 


D127 


428 445 


434 


LPF [200] 


D277 


558 575 


569 KDE[2000], GEV[2000] 



Table 4 lists the problem instances for which we were able to improve upon the 
current best known solutions. VBSS using the LPF heuristic is able to improve upon 
the best known solutions to a couple of problem instances. The same is true of VBSS 
using MTS. KDE[2000] and GEV[2000] also improve upon a couple additional best 
known solutions. 

5 Summary and Conclusions 

In this paper, we introduced a general framework for combining multiple search heuris- 
tics within a single stochastic search. The stochastic search algorithm that the study 
focused on was that of VBSS which is a non- systematic tree-based iterative search that 
uses randomization to expand the search around a heuristic’s prescribed search-space 
region. The approach recommended by this paper, however, can be applied to other 
search algorithms that rely on the advice of a search heuristic and for any problem for 
which there is no one heuristic that is obviously better than others. 
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In developing the approach to combining multiple search heuristics, we have con- 
jectured that the distribution of the quality of solutions produced by a stochastic search 
algorithm that is guided by a strong domain heuristic can best be modelled by a family 
of distributions motivated by extreme value theory. This leads to the use of the GEV 
distribution within our framework. Two methods of implementing the GEV have been 
considered: 1) maximum likelihood estimates computed by potentially costly numeri- 
cal methods; and 2) kernel density estimation using a bandwidth parameter tuned under 
the assumption of a GEV distribution. 

The effectiveness of this approach was validated using the NP-Hard constrained op- 
timization problem known as RCPSP/max. On standard benchmark RCPSP/max prob- 
lems, our EVT-motivated approach was shown to be competitive with the current best 
known heuristic algorithms for the problem. The best available truncated branch-and- 
bound approach is capable of finding a greater number of optimal solutions, but at a 
much greater computational cost. Our EVT-motivated approach is, however, able to 
find more optimal solutions than ISES (the previous best known heuristic algorithm for 
RCPSP/max) and with less deviation than ISES from the known lower bounds on solu- 
tion quality. The approach we have taken in this paper has also improved upon current 
best known solutions to difficult benchmark instances. 

One potentially interesting future direction to explore is whether or not there is 
any connection between the heavy-tailed nature of runtime distributions of CSP search 
algorithms noted by Gomes et al. [10] and the heavy-tailed nature of the solution quality 
distributions observed in our own work - the extreme value distributions type II & III 
are both heavy-tailed. Are the runtime and solution quality distributions in constrained 
optimization domains at all correlated, and if so can this be used to enhance search? 
This is a direction that will be worth exploring. 
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Abstract. We analyze the complexity of optimization problems ex- 
pressed using valued constraints. This very general framework includes 
a number of well-known optimization problems such as MAX-SAT, and 
Weighted MAX-SAT, as well as properly generalizing the classical 
CSP framework by allowing the expression of preferences. We focus on 
valued constraints over Boolean variables, and we establish a dichotomy 
theorem which characterizes the complexity of any problem involving a 
fixed set of constraints of this kind. 

1 Introduction 

In the classical constraint satisfaction framework each constraint allows some 
combinations of values and disallows others. A number of authors have suggested 
that the usefulness of this framework can be greatly enhanced by extending 
the definition of a constraint to assign different costs to different assignments, 
rather than simply allowing some and disallowing others [1] . Problems involving 
constraints of this form deal with optimization as well as feasibility: we seek 
an assignment of values to all of the variables having the least possible overall 
combined cost. 

In this extended framework a constraint can be seen as a cost function, 
mapping each possible combination of values to a measure of undesirability. 
Several alternative mathematical frameworks for such cost functions have been 
proposed in the literature, including the very general frameworks of ‘semi-ring 
based constraints’ and ‘valued constraints’ [1]. For simplicity, we shall adopt the 
valued constraint framework here (although our results can easily be adapted to 
the semi-ring framework, for appropriate semi-ring structures). This very gen- 
eral framework includes a number of well-known optimization problems such as 
MAX-CSP, MAX-SAT, and Weighted MAX-SAT, as well as properly gen- 
eralizing the classical CSP framework by allowing the expression of preferences. 

In general, optimization problems in this framework are NP-hard, so it is 
natural to investigate what restrictions can be imposed to make them tractable. 
One way to achieve this is to restrict the form of the cost functions which are 
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allowed in problem instances. Such a restricted set of cost functions is called a 
valued constraint language. In this paper we investigate the complexity of 
problems involving different kinds of valued constraint languages. We focus on 
the Boolean case, where each variable can take just two different values, and 
we obtain a complete characterization of the complexity for all possible valued 
constraint languages over Boolean variables. 

Our results generalize a number of earlier results for particular forms of 
optimization problem involving Boolean constraints. For example, Creignou et 
al obtained a complete characterization of the complexity of different constraint 
languages for the Weighted MAX-SAT problem [2], where all costs are finite, 
and a cost is associated with each individual constraint (rather than with each 
individual combination of values for that constraint, as we allow here). 

2 Definitions 

In the valued constraint framework, a constraint is specified by a function which 
assigns a cost to each possible assignment of values for the variables it is con- 
straining. In general, costs may be chosen from any valuation structure , satisfying 
the following definition. 

Definition 1 . A valuation structure, x, is a totally ordered set, with a min- 
imum and a maximum element ( denoted 0 and oo ), together with a commuta- 
tive, associative binary aggregation operator ( denoted + ), such that for all 
a, A 7 € X 



a + 0 = a ( 1 ) 

a + 7 > /3 + 7 whenever a > (3. (2) 

In this paper we shall use the valuation structure N, consisting of the natural 
numbers together with infinity, with the usual ordering and the usual addition 
operation. 

Definition 2. An instance of the valued constraint satisfaction problem, VCSP, 
is a tuple V = (V, D,C,x) where: 

— V is a finite set of variables; 

— D is a finite set of possible values for these variables; 

— x is a valuation structure representing possible costs; 

— C is a set of constraints. 

Each element of C is a pair c = (a, <f>), where a is a tuple of variables called 
the scope of c, and (j> is a mapping from D to x> called the cost function 
of c. 

Throughout the paper, the ith component of a tuple t will be denoted t[i\, and 
the length of t will be denoted |f|. For any two fc-tuples u, v we will say that 
u < v if and only if u[i\ < v[i] for i = 1,2, ... ,k. 
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Definition 3. For any VCSP instance V — ( V , D , C, x) , on assignment for V 
is a mapping s from V to D. The cost of an assignment s, denoted Costp(s), 
is given by the sum (i.e., aggregation) of the costs for the restrictions of s onto 
each constraint scope, that is, 

Costp(s ) = E ^( s (fl), s (f 2 ), • • • , s(v m )). 

((V1,V 2 ,tj>)dC 

A solution to V is an assignment with minimal cost, and the goal is to find a 
solution. 

Example 1 (SAT). For any instance V of the standard propositional satisfiabil- 
ity problem, SAT, we can define a corresponding valued constraint satisfaction 
problem instance V in which the range of the cost functions of all the constraints 
is the set {0, oo}. For each clause c of V , we define a corresponding constraint 
c of V with the same scope; the cost function of c maps each tuple of values 
allowed by c to 0, and each tuple disallowed by c to oo. 

In this case the cost of an assignment s for V equals the minimal possible 
cost, 0, if and only if s satisfies all of the clauses of V. 

Example 2 (MAX-SAT) . In the standard MAX-SAT problem the aim is to 
find an assignment to the variables which maximizes the number of satisfied 
clauses. For any instance V of MAX-SAT, we can define a corresponding valued 
constraint satisfaction problem instance V # in which the range of the cost func- 
tions of all the constraints is the set {0, 1}. For each clause c of V, we define a 
corresponding constraint of V # with the same scope; the cost function of 
maps each tuple of values allowed by c to 0, and each tuple disallowed by c to 1. 

In this case the cost of an assignment s for V # equals the total number of 
clauses of V which are violated by s. Hence a solution to corresponds to 
an assignment of V which violates the minimum number of clauses, and hence 
satisfies the maximum number of clauses. 

A similar construction can be carried out for the weighted version of the 
MAX-SAT problem. 

The problem of finding a solution to a valued constraint satisfaction problem is 
an NP optimization problem, that is, it lies in the complexity class NPO (see [2] 
for a formal definition of this class). It follows from Examples 1 and 2 that 
there is a polynomial-time reduction from some known NP-hard problems to the 
general VCSP. To achieve more tractable versions of VCSP, we will now consider 
the effect of restricting the forms of cost function allowed in the constraints. 

Definition 4. Let x be a valuation structure. A valued Boolean constraint 
language with costs in x is defined to be a set of functions, T, such that each 
(f> £ r is a function from {0, l} m to x> f or some natural number m, where m is 
called the arity of (f. 

The class VCSP(T) is defined to be the class of all VCSP instances where 
the cost functions of all constraints lie in r. 
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For any valued Boolean constraint language T, if every instance in VCSP(-T) 
can be solved in polynomial time then we will say that T is tractable. On the 
other hand, if there is a polynomial-time reduction from some NP-hard problem 
to VCSP(-T), then we shall say that T is NP-hard. 

Example 3 (SAT and MAX-SAT). Let T be any valued Boolean constraint 
language. 

If we restrict T by only allowing functions with range {0,oo}, as in Exam- 
ple 1, then each problem VCSP(/ n ) corresponds precisely to a classical Boolean 
constraint satisfaction problem. Such problems are sometimes known as Gen- 
eralized Satisfiability problems [3]. The complexity of VCSP(T') for such 
restricted sets T has been completely characterized, and the six tractable cases 
have been identified [3,2]. 

Alternatively, if we restrict T by only allowing functions whose range has 
exactly two finite values including 0, as in Example 2, then each VCSP(T) cor- 
responds precisely to a standard Weighted MAX-SAT problem [2], in which 
the aim is to find an assignment in which the total weight of satisfied clauses 
is maximized. The complexity of VCSP(T') for such restricted sets T has been 
completely characterized, and the three tractable cases have been identified (see 
Theorem 7.6 of [2]). 

We note, in particular, that when T contains just the single binary function 
4>xor defined by 



then VCSP(-T) corresponds to the MAX-SAT problem for the exclusive-or pred- 
icate, which is known to be NP-hard (see Lemma 7.4 of [2]). 

In an earlier paper [4], we introduced the idea of expressing a desired cost 
function by using a combination of available functions. The next two definitions 
formalize this idea. 

Definition 5. For any VCSP instance V = ( V,D,C,\ )> an d an y tuple of dis- 
tinct variables W = (v %, . . . , Vk), the cost function for V on W, denoted 
is defined as follows: 



$p{di, ■ ■ -,d k ) = min {Costp(s) \ s : V — > D, (s(v i), . . . ,s(v k )) = (di, . . . , d k )} 



Definition 6. A function <fi is expressible over a valued constraint language T 
if there exists an instance V = (V,D,C, x) in VCSP (T) and a list W of variables 
from V such that </> = <p}p . 

The set of all functions expressible over a valued constraint language T is 
denoted T* . 

The notion of expressibility has already been shown to be a key tool in analyzing 
the complexity of valued constraint languages, as the next result indicates. 

Proposition 1 ([4]). Let T and T' be valued constraint languages, with T' 
finite. If T' C T* , then VCSP^ 7 ) is polynomial-time reducible to VCSP(T). 
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The next result shows how Proposition 1 can be used to establish NP-hardness 
of a valued Boolean constraint language. 

Corollary 1. Let r be a valued Boolean constraint language, with costs in X- 
If r* contains a binary function 4 >xor+ defined by 

for some a < /3 < oo, then VCSP(-T) is NP-hard. 

Proof. Lemma 7.4 of [2] states that the Boolean problem VCSP({(/>xo.r}) is NP- 
hard, where cfxOR is the Boolean exclusive-or function, as defined in Example 3. 
Since adding a constant to all cost functions, and scaling all costs by a constant 
factor, does not affect the difficulty of solving a VCSP instance, we conclude 
that VCSP({<?!>j ( : 0 -R+}) is also NP-hard. □ 

A similar notion of expressibility has been used extensively for classical con- 
straints, which are specified using relations, rather than cost functions [5-7]. It 
has been shown that the relations expressible by a given set of relations are de- 
termined by certain algebraic invariance properties of those relations, known as 
polymorphisms [5, 6, 8]. A polymorphism of R, as defined in [5, 8], is a function 
/ : D k — > D , for some k, with the property that whenever t\, . . . ,t k are in R 
then so is </(ti[l], ■ • - , *fe[l]), ■ ■ •,/(* lH, . . .,t k [m])). 

The concept of a polymorphism is specific to relations , and cannot be applied 
directly to the functions in a valued constraint language. However, we now define 
a more general notion, introduced in [4], which we call a multimorphism, and 
which does apply directly to functions. 

Definition 7. Let D be a set , x a valuation structure, and (f : D m — > x a 
function. 

We extend the definition of <j> in the following way: for any positive integer 
k, and any list of k-tuples, t\, f 2 , . . . , t m , over D, we define 

k 

</>(fi,f 2 , • • • ,t m ) = • • -,tm[i]) 

i — 1 

We say that F : D k — > D k is a multimorphism of <f if, for any list of k-tuples 
f i , f 2 . . . , tm over D we have 

4>(F(ti),F(t 2 ),...,F(t m )) < <j)(ti,t2,...,t m ). 

For any valued constraint language r we will say that T has a multimorphism 
F if and only if A is a multimorphism of <j> for each <p £ r. 

The following result establishes that multimorphisms have the key property 
that they extend to all functions expressible over a given language. 

Theorem 1 ([4]). Every multimorphism of a valued constraint language T is 
also a multimorphism of r* . 
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In the rest of the paper we will usually denote a multimorphism F : D k — > D k 
by listing explicitly the k separate component functions F ( : D k — > D, given by 
Fi(x i,x 2 , ...,x k ) = F(x i,x 2 , ■ ■ ■ ,£*)[*]. 

It is shown in [4] that if F : D k — > D k is a multimorphism of a function <j>, 
then each of the component functions, Fj, is a polymorphism of the corresponding 
feasibility relation, Feas(</>), which is defined as follows: 

Definition 8. For any function <f>, with arity m, we define a relation known as 
the feasibility relation of <j>, and denoted Feas(c/>), as follows: 

(xi,x 2 , ■ ■ ■ ,x m ) £ Feas(^) <=> 4>(xi,x 2 , ■ ■ - ,x m ) < oo. 

3 Dichotomy Theorem 

In this section we will show that in the Boolean case all the tractable valued 
constraint languages are precisely characterized by the presence of certain forms 
of multimorphism. In fact we establish a dichotomy result: if a valued constraint 
language has one of eight types of multimorphism then it is tractable, otherwise 
it is NP-hard. 

Theorem 2. For any valued Boolean constraint language F, if F has one of the 
following multimorphisms then VCSP(F) is tractable: 

1. (0), where 0 is the constant unary function returning the value 0; 

2. (1), where 1 is the constant unary function returning the value 1; 

3. (max, max) , where max is the binary function returning the maximum of its 
arguments (i.e., ma x(x,y) = x V y); 

4- (min, min), where min is the binary function returning the minimum of its 
arguments (i.e., vo.m.{x,y) = x Ay); 

5. (min, max); 

6. (Mjty, Mjty, Mjty), where Mjty is the ternary majority function, (i.e., it 
satisfies the identity Mjty (x,x,y) = Mjty (a;, y, x) = Mjty (y,x,x) = x); 

7. (Mnty, Mnty, Mnty), where Mnty is the ternary minority function, (i.e., it 
satisfies the identity Mnty (a;, x, y) = Mnty(x, y, x) = Mnty(y, x,x) = y ); 

8. (Mjty, Mjty, Mnty); 

In all other cases VCSP(F) is NP-hard. 

We shall prove this result using a series of lemmas in Sections 3.1 and 3.2. 

A cost function </> will be called essentially classical if </> takes at most one 
finite value, that is, there is some value a such that 4>(x) = /3<oo=>/3 = a. 
Any valued constraint language F containing essentially classical cost functions 
only will be called an essentially classical language. Note that when F is 
an essentially classical language any assignment with finite cost has the same 
cost as any other assignment with finite cost. Hence we can solve any instance 
of VCSP(F) for such languages by solving the corresponding classical constraint 
satisfaction problem in which each valued constraint (o, <f>) is replaced by the 



218 David Cohen, Martin Cooper, and Peter Jeavons 



classical constraint (a, Feas (</>)) (see Definition 8). Hence the complexity of any 
essentially classical valued Boolean constraint language can be determined using 
Schaefer’s Dichotomy Theorem for classical Boolean constraints [3,6]. We will 
use this observation a number of times in the course of the proof. 

3.1 Tractable Cases 

To establish the first part of Theorem 2, we must show that a valued Boolean 
constraint language which has one of the eight types of multimorphisms listed 
in the theorem is tractable. 

We first note that the tractability of any valued constraint language (not nec- 
essarily Boolean) which has a multimorphism of one of the first two types listed 
in Theorem 2 was established in Theorem 2 of [4]. Furthermore, the tractability 
of any valued constraint language (not necessarily Boolean) which has a multi- 
morphism of the third type listed in Theorem 2 was established in Theorem 4 
of [4], and a symmetric argument (with the ordering reversed) establishes the 
tractability of any valued constraint language with a polymorphism of the fourth 
type. Finally, the tractability of any valued Boolean constraint language which 
has a multimorphism of the last type listed in Theorem 2 follows immediately 
from Theorem 5 of [4]. 

Hence, for the first part of the proof we only need to establish the tractabil- 
ity of valued constraint languages having one of the remaining three types of 
multimorphisms listed in Theorem 2. This is done in the next three lemmas. 

Lemma 1. Any valued Boolean constraint language which has the multimor- 
phism (Mjty, Mjty, Mjty) is essentially classical, and tractable. 

Proof. Let be a k- ary cost function which has the multimorphism 
(Mjty, Mjty, Mjty). It follows from the definition of the Mjty function and the 
definition of a multimorphism that for all x, y £ D k , 3f)(x) < <p(x) + <f(x) + 4>(y) 
and 3 f>(y) < cf(y) + cf(y) + cf(x). Hence, if both </>(x) and cj)(y ) are finite, then we 
have cf(x ) < (f(y) and 4>(y) < </>(x), so they must be equal. Hence cj) is essentially 
classical, so T is essentially classical. 

Furthermore, since for each </> £ T, Feas(<(>) has the polymorphism Mjty, it 
follows from Schaefer’s Dichotomy Theorem [3, 6] that VCSP(-T) is tractable. □ 

Lemma 2. Any valued Boolean constraint language which has the multimor- 
phism (Mnty, Mnty, Mnty) is essentially classical, and tractable. 

Proof. Let ^ be a k- ary cost function which has the multimorphism 
(Mnty, Mnty, Mnty). It follows from the definition of the Mnty function and the 
definition of a multimorphism that for all x, y £ D k , 3 </>(x) < <p(x) + 4>{y) + 4>{y) 
and 3 </>(y) < cf>(y) + <p(x) + 4>{x). Hence, if both (j>{x ) and <p(y) are finite, then we 
have cf(x ) < cj)(y) and f>(y) < </>(x), so they must be equal. Hence (j) is essentially 
classical, so T is essentially classical. 

Furthermore, since for each 4> £ T, Feas(</>) has the polymorphism Mnty, 
and the Mnty operation is the affine operation over the field with 2 elements, it 
follows from Schaefer’s Dichotomy Theorem [3, 6] that VCSP(T) is tractable. □ 
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The only remaining case for which we need to establish tractability is for val- 
ued Boolean constraint languages which have the multimorphism (min, max). It 
was shown in [4] that valued constraint languages with this multimorphism are 
closely connected with the submodular set functions used in economics and oper- 
ations research [9]. Because of this connection, we shall refer to valued constraint 
languages with the multimorphism (min, max) as submodular languages. 

It was established in Theorem 3 of [4] that finite-valued submodular lan- 
guages are tractable. We now generalize this result in the Boolean case to include 
all submodular languages, including those where the cost functions take infinite 
values. 

Lemma 3. Any submodular valued Boolean constraint language is tractable. 

Proof. Let f be a submodular valued Boolean constraint language, and let V 
be any instance of VCSP(T). 

Every cost function in V has the multimorphism (min, max), and so the 
corresponding feasibility relation has the polymorphisms min and max. Since 
the polymorphism Mjty can be obtained by composition from min and max, 
it follows that each of these feasibility relations has the polymorphism Mjty, 
and hence is decomposable into binary relations [10]. Hence we can determine 
whether or not V has a solution with finite cost by solving a corresponding 
instance of 2-SAT, which can be solved in polynomial-time. 

If V has any finite cost solution, then we can find a solution with minimal 
cost by solving a submodular minimisation problem over the set of all solutions 
allowed by the feasibility relations of the constraints of V . This set has the poly- 
morphisms min and max, and hence forms a distributive lattice. A polynomial- 
time algorithm for minimising a submodular function over a distributive lattice 
is given in [11]. □ 

3.2 Intractable Cases 

To establish the remaining part of Theorem 2, we must show that a valued 
Boolean constraint language which does not have any of the types of multimor- 
phisms listed in the theorem is NP-hard. We first deal with essentially classical 
languages. 

Lemma 4. Any valued Boolean constraint language which is essentially classical 
and does not have any of the multimorphisms listed in Theorem 2 is NP-hard. 

Proof. If we replace each cost function f> in r with the relation Feas(^) then we 
obtain a classical Boolean constraint language r' which does not have any of 
the polymorphisms 0, 1, min, max, Mjty or Mnty. 

By Schaefer’s Dichotomy Theorem [3, 6], T' is NP-complete, and hence r is 
NP-hard. □ 

For the remaining languages, our strategy will be to show that any language 
which does not have one of the multimorphisms listed in Theorem 2 can express 
certain special functions, which we now define. 
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Definition 9. A unary cost function <j is a 0-selector if cr(0) < cr( 1) and it is 
a finite 0-selector if in addition, cr(l) < oo. A (finite) 1-selector is defined 
analogously. A selector is either a 1-selector or a 0-selector. 

— A binary cost function <f> is a NEQ function if 

<K 0 , 1 ) = 0 ( 1 , 0 ) < 0 ( 1 , 1 ) = 0 ( 0 , 0 ) = 00 . 

— A binary cost function 0 is an XOR function if 

0 ( 0 , 1 ) = 0 ( 1 , 0 ) < 0 ( 1 , 1 ) = 0 ( 0 , 0 ) < 00 . 

Lemma 5. Let r be a valued Boolean constraint language which is not essen- 
tially classical. 

If r* contains a NEQ function, then either r* contains both a finite 0- 
selector and a finite 1 -selector, or else r* contains an XOR function. 

Proof. Let is € r* be a NEQ function. 

First we show that if r* contains a finite 0-selector oq , then it also contains 
a finite 1-selector. To see this simply construct the instance Vq with variables 
{x,y} and constraints {((x),ao),{(x,y),is)}, and note that <S>)p is a finite 1- 
selector. Similarly, if P * contains a finite 1-selector, then it also contains a finite 
0-selector. 

Now let f £ r be a cost function of arity m which is not essentially classical. 
Choose tuples u, v such that ((u) and ((v) are as small as possible with ((u) < 
((v) < oo. Let V be the VCSP instance with four variables: {xoo, Xqi, *io, £ 11 }, 
and three constraints: 

((^ia[1]v[1] , ■ ■ ■ , X u ^ rn ^ v ^ m ]),C), {{xoo.xu), is), ((x 01 , Xio) , is) . 

Let W = (xoi,a:ii), and 0 = <P}p . 

Note that 0(0, 1) = £( u ) + 2is(0, 1) and 0(1, 1) = 0(n) + 2is(0, 1). If 0(0, 1) ^ 
0(1,0), then, by the choice of u , 0(0,1) < 0(1,0), and 0(0,1) < 0(1,1) < oo, 
so <Pp 01 ^ is a finite 0-selector. 

Hence we may assume that 0(0,1) = 0(1,0). If 0(0,0) ^ 0(1,1), then 
if 0(0,0) < oo the function ip(x,x ) is a finite selector, and hence r* con- 
tains both a finite 0-selector and a finite 1-selector. On the other hand, if 
0(0,0) = oo then construct the instance V 2 with variables {x,y} and con- 
straints {((£, x), ip), ((x, y), ip)}- In this case <!>)$ is a finite 0-selector, and hence 
r* again contains both a finite 0-selector and a finite 1-selector. 

Otherwise we may assume that 0(0,1) = 0(1,0) and 0(0,0) = 0(1,1). By 
construction, we have 0(0, 1) = (( u ) + 2^(0, 1) < ((v) + 2is(0, 1) = 0(1, 1) < 00 . 
So in this case ip is an XOR function. □ 

Lemma 6. Let r be a valued Boolean constraint language which is not essen- 
tially classical, and does not have either of the multimorphisms (0) or (1). 

Either r* contains a 0-selector and a 1-selector, or else r* contains an XOR 
function. 



A Complete Characterization of Complexity 221 



Proof. Let (f>o € r be a cost function which does not have the multimorphism 
(0), and <pi £ r be a cost function which does not have the multimorphism (1), 
and let m be the arity of 4>o. Choose a tuple r such that (f> o(r) is the minimal 
value of 4>o- By the choice of 4>o, we have 4>o(i") < 4> o(0, 0, . . . , 0). 

Suppose first that r* contains a 0-selector oq. Let M be a finite natu- 
ral number which is larger than all finite costs in the range of </>g. We con- 
struct the instance V £ VCSP(-T) with two variables {xo,iei}, and two con- 
straints ((a^m , . . . , £ r [ m ]}, (f>o) and ((xo),Mcro). It is straightforward to check 
that ’Lp 1 ' 1 {!) < and so in this case C* contains a 1-selector. A similar 

argument, using </>i, shows that if C* contains a 1-selector, then it also contains 
a 0-selector. 

Hence, we need to show that either r* contains a selector, or it contains 
an XOR function. If (/>o(0, . . . , 0) ^ ^o(l) • • - , 1) then the unary cost function 
a(x) = (j>o(x, . . . ,x) in r* is clearly a selector, and the result holds. 

Otherwise, we construct the instance V' £ VCSP(r') with two variables 
{xq,Xi} and the single constraint ((a^m , . . . , x r [ m i), (f>o). Now, by considering 
the costs of all four possible assignments, we can verify that either d>p°^ or i’p) 

is a selector, or else v = <Pp°’ xl ' > is either an XOR function, or a NEQ function. 

If v is an XOR function we are done, otherwise we appeal to Lemma 5 to 
complete the proof. □ 

Many of the remaining lemmas in this Section use the following construction 
which combines a given cost function cj) of arbitrary arity with a pair of selectors, 
in order to express a binary cost function with some similar properties. 

Construction 1. Let <f> be any m-ary cost function which is not identically 
infinite , and let <tq be a 0-selector and <j\ a 1-selector. Let u, v be two m-tuples, 
and let M be a natural number larger than all finite costs in the range of <f>. 
LetV be a VCSP instance with variables {soo, Xoi, £io, ccn}, and constraints: 

? • • • ? ): 4^} ; ((*^Oo): -^LcTo), ( (*£ll ) , A/cr^ ) . 

The binary cost function (f >2 *= <p^ 01,Xl ° ^ will be called a compression of f) by 
u and v. 

Lemma 7. A function ()> has a (max, max) multimorphism if and only if 

— cf> is finitely antitone, that is, for all tuples u,v with 4>{u), (f>(v) < oo, 

u < v =>■ (j>{u) > 4>(v). 

— Feas(</>) has the polymorphism max. 

Proof. If 4> has a (max, max) multimorphism, then for all tuples u, v we have 
(j){u) + cj){v ) > 2^>(max('u, ?;)), so both conditions hold. 

Conversely, if (f does not have a (max, max) multimorphism, then there exist 
tuples u,w such that (j>{u) + f)(w) < 20(max(it, w)). Hence, without loss of 
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generality, we may assume that <p(u) < <p(max(u, w)). Setting v = ma x(u,w) we 
get u < v and <p(u ) < <p(v). If (p(v) < oo then the first condition does not hold, 
and if cp(v) = oo, then the second condition fails to hold. □ 

Lemma 8. Let r be a valued Boolean constraint language which is not essen- 
tially classical, and does not have any of the multimorphisms (0) or (1) or 
(max, max) or (min, min). 

Either r* contains a finite 0-selector and a finite 1-selector, or else r* con- 
tains an XOR function. 

Proof. Let <p be a cost function in r which does not have a (max, max) multi- 
morphism, and let ip be a cost function in r which does not have a (min, min) 
multimorphism . 

By Lemma 6, either r* contains an XOR function and we have nothing to 
prove, or else r* contains a 0-selector, oq, and a 1-selector, a\. 

Since (p does not have a (max, max) multimorphism, it follows from Lemma 7 
that either <p is not finitely antitone, or else the relation Feas (<p) does not have 
the polymorphism max. 

For the first case, choose two tuples u and v, with u < v with <p(u) < 
<p{v) < oo, and let (p 2 be a compression of <p by u and v (see Construction 1). 
It is straightforward to check that 02 (0, 0) < 02(1,1) < oo, which means that 
cp 2 {x,x) is a finite 0-selector belonging to F* . 

On the other hand suppose that <p is finitely antitone, and that r* con- 
tains a finite 1-selector r. In this case we know that Feas(<^) does not have 
the polymorphism max, so we can choose u, v such that <p(u ) , (p{v) < oo and 
</>(max(u, v)) = oo. Let <p 2 be a compression of <p by u and v, and construct the 
instance V £ VCSP(F*) with variables {x,y}, and constraints: 

((x,y),<p 2 ), ({y,x),<p 2 ), (( y),r ). 

The fact that (p is finitely antitone gives <p(u), (p(v) < 0(min(u, v)). This, together 
with the fact that <p{u) and cp(v) are finite whilst ^(max(it, v)) is infinite, is 

(x) 

enough to show that <P)p is a finite 0-selector. 

So, we have shown that if F* contains a finite 1-selector, then it contains a 
finite 0-selector whether or not (p is finitely antitone. A symmetric argument, 
exchanging 0 and 1, max and min, and <p and ip, shows that if r* contains a 
finite 0-selector, then it contains a finite 1-selector. 

Hence, to complete the proof we may assume that T* contains no finite 
selectors. In this case we know that Feas(</>) does not have the polymorphism 
max and Feas(^) does not have the polymorphism min, so we may choose tuples 
u, v, w, z such that <p(u), 4>{v), ip(w) and ip{z) are all finite, but 0(max(u, v)) and 
ip(mm(w, z)) are both infinite. Now let <p 2 be a compression of <p by u and v, 
and ip 2 a compression of ip by w and z (see Construction 1). We then have that 

def 

p{x,y) = (p 2 {x,y) + cp 2 (y,x) + ip 2 (x,y) + ip 2 {y,x) is a NEQ function which is 
contained in r*. We can now appeal to Lemma 5 to show that T* contains an 
XOR function, and we are done. □ 
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Lemma 9 . Let r be a valued Boolean constraint language which does not have 
the multimorphism (min, max). 

If r contains both a finite 0 -selector and a finite 1 -selector, then r* contains 
a NEQ function or an XOR function. 

Proof. Suppose that do £ r* is a finite 0 -selector, a\ £ E* is a finite 1 - 
selector, and 0 £ T is non-submodular (i.e. , 0 does not have the multimorphism 
(min, max)). 

Set A = cto( 1 ) — (7 o ( 0 ) and p = cri( 0 ) — <ti( 1 ). 

Choose u,v such that 0 (rt) + <f>{v ) < 0 (ma x(u,v)) + 0 (min(u, v)). Let <j > 2 be 
a compression of <j) by u and v. It is straightforward to check that 02 is also not 
submodular. 

There are three cases to consider: 

Case ( 1 ): 0 2 (O, 0 ), 0 2 ( 1 , 1 ) < 00. 

Construct the instance V £ VCSP(T*) with variables {x, y }, and constraints 
{{x,y },2 A// 0 2 ), 

((x), A( 0 2 ( 1 , 0) + 0 2 ( 1 , 1))cti), ((x), /r( 0 2 (O, 0) + 0 2 (O, l))cr 0 ), 

<<3/) 5 A(0 2 (0, 1) + 0 2(1, l))cri), ((y),n((j)2(0, 0) + 0 2 (1, 0))cr o ) . 



If we set W = (x, y ) , then it is straightforward to check that <P}p is an XOR 
function. 

Case( 2 ): Exactly one of 0 2 ( 0 , 0 ) and 0 2 ( 1 , 1 ) is finite. 

First suppose that 0 2 (O, 0 ) = 00 > 0 2 ( 1 , 1 ). Let a = max{02(O, 1)+02(1, 0 ) — 
202(1,1) + 1 , 0 }, and construct an instance V2 £ VCSP(.T*) with variables 
{x, u, v , y}, and constraints 



{{x,u),n<j>2), 

((u,v),fj,<j> 2 }, 

{{v,y),p,(t)2}, 

((x),acri), 

((x),2acri), 



((-u,x),/i^ 2 ), 

2), 

2), 

((u),2aai), 
((y),ac ri). 



If we set W = (x, y), and 77 = lAp , then it is straightforward to verify that 
?7(0, 1) = 77(1, 0), 77(0, 0) = 77(1, 1), and 



77(0, 0 ) = 77(0, 1) + + 202(1, 1) - 02(0, 1) - 0 2 (1, 0 )), 



and hence that 77 is an XOR function. 

A symmetric argument clearly works when 02(1,1) = 00 > 02(0,0). 

Case ( 3 ): 02(0,0) = 02(1,1) = 00. 

In this case the function 0 2 (x, 7/) + 0 2 (?/, x) is a NEQ function which is clearly 
contained in T*. 

□ 



Lemma 10. A Boolean function 0 has a (Mjty, Mjty, Mnty) multimorphism if 
and only if: 
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— cf> is finitely modular , that is, for all tuples u,v with <f(u), <p(v), </>(max(ii, 

v)), </>(min (u,v)) < oo, 

(j>{u) + = </>(max(w, v)) + </>(min(u, 7 ;)). 

— Feas(</>) has the polymorphisms Mjty and Mnty. 

Proof. Follows immediately from the characterization of valued Boolean con- 
straint languages with a (Mjty, Mjty, Mnty) multimorphism given in Theorem 
4.17 of [12], ' □ 

Lemma 11. Let r be a valued Boolean constraint language which does not have 
the multimorphism (Mjty, Mjty, Mnty). 

If r* contains a finite 0-selector, a finite 1-selector, and a NEQ function, 
then r* contains an XOR function. 

Proof. Suppose that Oq £ r* is a finite 0-selector, o\ £ r* is a finite 1-selector, 
v £ r* is a NEQ function, and <p £ T does not have the multimorphism 
(Mjty, Mjty, Mnty). We have to show that r* also contains an XOR function. 

By Lemma 10 there are 2 cases: either <f> is not finitely modular, or Feas(<)>) 
does not have both polymorphisms Mjty and Mnty. 

In the first case, choose tuples u,v such that 4>{u) + <f>{v) </>(min(rt, u)) + 

(f>( max(«,i))). Let (j> 2 be a compression of 4> by u and v. It is straightforward to 
check that (j> 2 is also not finitely modular. Now construct the instance V with 
variables {w, x, y, z}, and constraints 

((x,w),u), {{z,y),v), (( x,z),f> 2 ), {{w,y),(j) 2 ). 

It is straightforward to check that either or <P is an XOR function. 

Next, suppose that Feas(</>) has the polymorphism Mjty but not Mnty. In 
this case, by Theorem 3.5 of [10], Feas (cf>) is decomposable into binary relations 
(in other words, it is equal to the relational join of its binary projections). Since 
Feas(^)) does not have the Mnty polymorphism, this implies that one of its binary 
projections does not have the Mnty polymorphism. The only binary Boolean 
relations which do not have the Mnty polymorphism have exactly three tuples. 
Therefore, by projection, it is possible to construct from (j) a binary cost function 
if such that exactly three of ^(0, 0), i/>(0, 1), ^>(1, 0), t/;( 1, 1) are finite. If t/’( 0,1) 
or ip(l, 0 ) is infinite, then let rj be the projection onto variables x, y of if(x, v) + 
v(v,y), otherwise let y = tp. The cost function 77 is non-submodular and exactly 
one of 77 ( 0 , 0 ) and 77 ( 1 , 1) are infinite, and so, by Case 2 of Lemma 9, r* contains 
an XOR function. 

Suppose now that Feas (</>) has the polymorphism Mnty but not Mjty. Since 
Feas(c/>) has the polymorphism Mnty, it is an affine relation [2] over the finite field 
with 2 elements, GF(2), and can be expressed as a system of linear equations 
over GF(2). Creignou et al. define a Boolean relation to be affine with width 
2 if it can be expressed as a system of linear equations over GF(2), with at 
most two variables per equation [2]. In fact, linear equations over GF(2) with 
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one variable correspond to the unary relations, and linear equations over GF(2) 
with two variables correspond to the binary equality and disequality relations. 
The unary relations, and the binary equality and disequality relations all have 
both the Mjty and Mnty polymorphisms. Thus Feas(<^>) is affine but not of width 
2. Hence, by Lemma 5.34 of [2], Feas(</>) can be used to construct the 4-ary affine 
constraint w+x+y+z= 0. In other words, there is some ip £ T* such that 
ip(w, x,y,z) < oo iff w + x + y + z = 0. 

Now set A = ip(0, 0, 1, 1) + ip( 0, 1, 0, 1) + 1 and construct the VCSP instance 
V with variables {u>, x, y, z}, and constraints 

(( w,x,y,z),ip }, (( w),3Mcr 0 ), ((z), Acq) 

where M is a natural number larger than the square of any finite cost in the 
range of ip or o\. Let rj = <S>fp V \ It is straightforward to verify that 77 is a binary 
non-submodular function where both 77(0,0) and 77(1,1) are finite. Hence, by 
Case 1 of Lemma 9, the result follows in this case also. 

Finally, if Feas (<p) has neither the polymorphism Mnty nor Mjty, then the set 
of Boolean relations (Feas(</>), Feas^)} can be shown to have essentially unary 
polymorphisms only (see Theorem 4.12 of [7]). By Theorem 4.10 of [7], this 
implies that in this case Feas(^>) can again be used to construct the 4-ary affine 
constraint w + x + y + z = 0, and we can proceed as above. □ 

Lemma 12. Let T be a valued Boolean constraint language which does not have 
any of the multimorphisms listed in Theorem 2. 

Either T is essentially classical , or else r* contains an XOR function. 

Proof. Suppose that T is not essentially classical and has none of the multimor- 
phisms listed in Theorem 2. By Lemmas 9 and 8, either T* contains an XOR 
function, or else T* contains a NEQ function and a finite 0-selector and a finite 
1-selector. In the latter case, by Lemma 11 we know that T* contains an XOR 
function. □ 

Combining Lemmas 4 and 12, together with Corollary 1, establishes the NP- 
hardness of any valued Boolean constraint language having none of the multi- 
morphisms listed in Theorem 2, and so completes the proof of Theorem 2. 

4 Some Special Cases 

Corollary 2. Let T be a valued Boolean constraint language T where all costs 
are finite. If T has one of the multimorphisms (0), (1), or (min, max), then 
VCSP(T') is tractable. In all other cases VCSP(T) is NP-hard. 

Proof. Let (p be a cost function taking finite values only. By Lemma 7, if <p has the 
multimorphism (max, max), then <p is antitone, and hence has the multimorphism 
(1). By a symmetric argument, if <p has the multimorphism (min, min), then (p 
is monotone, and hence has the multimorphism ( 0 ). By Lemma 1, if (p has the 
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multimorphism (Mjty, Mjty, Mjty), then (j> is constant, and hence has the mul- 
timorphism (0). By Lemma 2, if cf> has the multimorphism (Mnty, Mnty, Mnty), 
then cj) is again constant, and hence has the multimorphism (0). By Lemma 10, 
if (j> has the multimorphism (Mjty, Mjty, Mnty), then (f> is modular, and hence 
submodular, that is, it has the multimorphism (min, max). 

The result now follows from Theorem 2. □ 

Using the construction given in Example 2, this immediately gives a dichotomy 
theorem for the Max-Sat problem for any r corresponding to a set of relations. 

Corollary 3. If T has one of the multimorphisms (0), (1), or (min, max), then 
MAX-SAT(r ) is tractable. In all other cases MAX-SAT(r ) is NP-hard. 

This result gives an alternative description to the one given in Theorem 7.6 of [2] 
for the three tractable cases of Max-Sat. 
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Abstract. We give an approximate and often extremely fast method of 
solving a portfolio optimisation (PO) problem in financial mathematics, 
which has applications in the credit derivatives market. Its correspond- 
ing satisfaction problem is closely related to the balanced incomplete 
block design (BIBD) problem. However, typical PO instances are an or- 
der of magnitude larger than the largest BIBDs solved so far by global 
search. Our method is based on embedding sub-instances into the origi- 
nal instance. Their determination is itself a CSP. This allows us to solve a 
typical PO instance, with over 10 746 symmetries. The high quality of our 
approximate solutions can be assessed by comparison with a tight lower 
bound on the cost. Also, our solutions sufficiently improve the currently 
best ones so as to often make the difference between having or not having 
a feasible transaction due to investor and rating-agency constraints. 



1 Introduction 

The structured credit market has seen two new products over the last decade: 
credit derivatives and credit default obligations (CDOs). These new products 
have created the ability to leverage and transform credit risk in ways not possible 
through the traditional bond and loan markets. 

CDOs typically consist of a special purpose vehicle that has credit exposure 
to around one hundred different issuers. Such vehicles purchase bonds and loans 
and other financial assets through the issuance of notes or obligations with vary- 
ing levels of risk. In a typical structure, credit losses in the underlying pool are 
allocated to the most subordinated obligations or notes first. A natural progres- 
sion of the market has been to use notes from existing CDOs as assets into a 
new generation of CDOs, called CDO Squared or CDO of CDO [9]. 

The credit derivatives market has allowed a more efficient mechanism for 
creating CDO Squared. The idea is to use sub-pools of credit default swaps 
instead of notes. The sub-pools are chosen from a collection of credits with the 
level of liquidity and risk adequate to the potential investors. These transactions 
are sometimes labelled synthetic CDO Squared. 

In the creation of a synthetic CDO, the natural question arises on how to 
maximise the diversification of the sub-pools given a limited universe of previ- 
ously chosen credits. In a typical CDO Squared, the number of available credits 
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